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I. INTRODUCTION 


The Naval Space Surveillance Center (NAVSPASUR) currently 
tracks daily over 6000 objects in elliptical orbits around the 
Earth. To assist in identification and tracking of these 
objects in orbit, NAVSPASUR uses an analytic satellite motion 
model implemented in the Fortran subroutine, PPT2. This 
subroutine predicts an artificial satellite’s position and 
velocity vectors at a selected time to aid in the tracking 
endeavor. Several calls to the subroutine may be required to 
aid in the identification of one object. 

With the current increase in space operations, the number 
of objects necessary to be tracked is expected to increase 
substantially. Subroutine PPT2 provides orbit prediction 
within an adequate response time for the current number of 
tracked objects in space. However, a substantial increase in 
the number of objects will cause the use of PPT2 on a serial 
computer to become less responsive and computationally 
burdensome. Additionally, if there exists a desire to 
increase the accuracy of the NAVSPASUR model, the resulting 
subroutine would require even more computing resources and 
make achieving results even more time consuming. 

Parallel computing offers one option to decrease the 
computation time and achieve more real-time results. Use of 


parallel computers has already proven to be beneficial in 


reducing computation time in many other applied areas. 
Parallel computing offers an opportunity to both increase the 
efficiency of the current model or reduce the computational 
burden of a more accurate future model. 

The ultimate objective of this thesis is to quantitatively 
determine the parallel computing potential of the current 
NAVSPASUR analytic model and determine the subsequent 
reduction in computer time if the model is applied to a 
hypercube multicomputor. The following chapter provides a 
description of the NAVSPASUR satellite model and outlines the 
algorithm used by the Fortran subroutine, PPT2. Chapter III 
provides an overview of parallel processing and discusses the. 
methods to decompose a serial algorithm to be applied to the 
hypercube. In Chapter IV, the two methods of decomposing the 
analytic model are presented with their respective success in 
reducing computation time. The last chapter of this thesis 


provides conclusions and suggestions for future research. 


II. NAVSPASUR SATELLITE MOTION MODEL 


A. OVERVIEW 

The satellite motion model, adopted by NAVSPASUR and 
implemented in subroutine PPT2, is a general perturbations 
variation of elements model of artificial satellite motion 
around the Earth. Given a set of a satellite’s "mean" orbital 
elements at a given epoch, the model predicts the state 
(position and velocity) vector at a future time. The model 
considers perturbing accelerations caused by atmospheric drag, 
oblateness of the Earth, and asymmetry of the Earth’s mass 
about the equatorial plane. This model ignores perturbations 
due to longitudinal variation in the Earth’s gravitational 
potential and the influence of other celestial bodies such as 
the Moon or the Sun.’ 

Satellite motion models can be classified by the technique 
used to integrate a satellite’s equations of motion and the 
method to describe the variation of the satellite’s orbit in 
reaction to the perturbing forces. The two primary techniques 
to solve satellite’s equations of motion are general 


perturbations and special perturbations. General Perturbation 


‘Although the NAVSPASUR model, implemented by PPT2, 
neglects the longitudinal variation to the  Earth’s 
gravitational potential, a correction for this variation can 
be made within PPT2 by a call to a second subroutine, LUNAR. 


techniques involve an analytic integration of the perturbing 
accelerations, while special perturbation techniques involve 
the direct numerical integration of the equations of motion to 
include all perturbing accelerations. Typically, general 
perturbation techniques are more difficult and not as accurate 
aS special perturbation techniques. However, special 
perturbation techniques, in general, provide this increase in 
accuracy at a cost of several orders of magnitude of computing 
resources (Bate, 1971, pp. 385 - 414). A third technique, now 
gaining in popularity, is a semi-analytic technique. This 
technique is combination of both the general and special 
perturbation techniques. The NAVSPASUR model, a general 
perturbations model, solves the satellite’s equations of 
motion by using series solutions to the ordinary differential 
equations. 

Within these techniques exists two methods to describe the 
variations to a satellite’s orbit. One method, variation of 
elements, describes variations to the orbit in terms of 
changes in the osculating orbital elements with respect to 
time. The other method, variation of coordinates, selects a 
coordinate system and describes variations to position and 
velocity in this coordinate system with respect to time. The 
disadvantage of the variation of coordinates method is that 
the solution provides no immediate insight to the geometry of 
the orbit (Danby, 1989, “oe olo Using the variation of 


elements method, the NAVSPASUR model describes the variations 


to an orbit in terms of changes to the classical orbital 


elements with respect to time. 


B. THEORY 

The NAVSPASUR model is based on a theory developed in 1959 
by Dirk Brouwer of Yale University (Brouwer, 1959, pp. 378 - 
397) and modified by R. L. Lyddane of the U. S. Naval Weapons 
Maperatory in 1963 (Lyddane, 1963, pp. 555 - 558). This 
theory considers an Earth’s gravitational potential 
significantly more involved than the gravitational potential 
used in the classical Kepler model of idealized satellite 
motion. The classical model assumes a perfectly spherical 


Earth and the gravitational potential may be expressed by 


U= (Ze ) 


hit 


where wt is the gravitational parameter and r is the radial 
distance of the satellite from the center of a spherical Earth 
(Bate, 1971, pp. 11 - 16). The theory used in the NAVSPASUR 
model assumes only that Earth is symmetrical about the north- 
south axis. Expressing the potential in spherical harmonics, 


the Earth’s gravitational potential is modeled by 





u=E+Ly =) PustmpielCmmcosgn so, sinc] (2.2) 
q2z0 


where Rp is the equatorial radius of the Earth, $8 is the 


satellite latitude, X is the longitude, C and S,, are 


nrg 


coefficients depending on the mass distribution, and P,? are 
the associated Legendre polynomials. These polynomials are 


defined in terms of the Legendre polynomials P,;: 


P, (x) =1 
P, (xX) =x 
P_(x) = en-i Ee) = Le (x) (2.3) 








P3 (x) =(1-x?) v2 =P, (x) 


Ignoring any longitudinal variation in the gravitational 


potential, Equation 2.2 may be simplified to 





_H_by Re , 2.4 
U= = =) ~J,P, (sinB) (2.4) 


where J,=-C, ,- The even J,’s account for the oblatenesssot sen 
Barth, while the odd J,’s account for the Earth’s asymmetry 
about the equatorial axis. (Solomon, 1991, p. 3) 
1. Brouwer’s Model 

Dirk Brouwer developed his theory of artificial 
satellite motion while under contract by the Air Force 
Cambridge Research Center and published this theory in the 
Astronomical Journal in 1959. In this article, Brouwer used 
a different notation for the classical orbital elements from 
the notation commonly recognized today. This notation is also 
adopted in the later NAVSPASUR model. In order to conform to 
the notation of Brouwer’s original article and NAVSPASUR 
model, this paper will also use Brouwer’s notation listed in 


Table 2.1% 


Table 2.1 Brouwer’s Notation 
SE ee BS. ypu nn en a 


Brouwer’s Notation Common Notation 
a semi-major axis a 
e eccentricity e 
x inclination 1 
g argument of perigee w 
h ascending node Q 
£ eine sanoma. Vv Vv 
1 mean anomaly M 


Brouwer’s model considers the zonal harmonics of the 
Earth’s gravitational potential, accounting for the Earth’s 
oblateness and its asymmetry about the equatorial axis as 
expressed in Equation 2.4. To simplify the potential 
equation, his model uses only the first four non-zero terms of 


the series described in Equation 2.3 with J,=0: 





y=. 
sm 


_ WK 
r? 


Eee ean pie 2 (aunb=5san3p) 
sis OAS os 


(3-30sin’?B+35sin‘B) (2.5) 





HA. 5 
+ ~ (15sinB-70sin?6+63sin° 
aarec B B B) 


where 





Figure 2.1 Classical Orbital Elements 


A, 9=-J,R6 
Ss 4 

ky=- ae 

As o= ~ TRS 


This truncation of the series introduces an error of the order 
O(k,?), where k,=.4841605:'107-0.5V5. (Brouwer, 1963, p. 393). 

In order to derive the equations of motion for the 
satellite, Brouwer utilized the Hamilton-Jacobi theory of 
dynamical systems to express the Hamiltonian F=U-%v’* in terms 
of canonically conjugate Delaunay variables. Letting a ande 
be the oosculating semi-major axis and eccentricity, 


respectively, the Delaunay variables are: 


L=y pa l=mean anomaly 


G=y Ha (1-e”) g=argument of perigee (2.6) 


H=/ Wa (1-e?) cosI h=longitude of ascending node 
Using the Delaunay variables in Equations 2.6, the equations 


of motion become 


aruda) | OF 
ce GH Ge Ge 
dG_OF dg__0F 27 
dt dg' dt v6 li 
dh_OF dh__oF 
lee ol Clee) fez 


where the hamiltonian is 


4 
We Lk, ail 
- ee A: Uf Qo 


ae 








2 3 
ae (1-5) =,00s (2g+2f)] (2.8) 


In order to express the hamiltonian, F, in terms of only the 
Delaunay variables, Brouwer used the following Fourier series 


expansions: 


3 Zot on 
a +)° 2P,cosjl 
mm j=l (2.9) 


; a 
=c0s (2g+2f)= }° Q,cos (2g+jl) 


B psa 
The coefficients P,, Q, are power series in the eccentric. 
e. 

Using two canonical transformations through the choice 
of suitable determining functions, S and S’, Brouwer was able 
to solve the system of ordinary differential equations listed 


in Equations 2.7 in terms of the mean elements, a", e", I", 


1", g", andh". The transformed hamiltonian, F"", depends only 
on the transformed variables L", G", and H". Replacing F by 
F"" in Equation Set 2.7, Brouwer found that L", G", and H" are 


constants with respect to time and 1", g", and h" are linear 
functions of time. Consequently, from Equation Set 2567 
follows that a", e", and I" are constant with respect to time. 
Additionally, as a consequence of the transformations (see 
Brouwer, 1959, pp. 379 - 393), Brouwer was able to separate 
the changes in the orbital elements with time into secular, 
long period, and short period variations. Including “omnis 


secular terms up to order O(k,*) and periodic terms to order 


IL, 


O(k,), Brouwer found that the secular variations are a 
mumeti2on Of only a”, e", I", and t; the long period variations 
are a function of a", e", I", and g"; and the short period 
variations are a function of all six mean elements. 
Additionally, a, e, and I do not experience any secular 
variations to order O(k,*) and a does not experience any long 
period variation to order - 0(k,). Using the following 


. constants, 


a’/’=semi-major axis constant 


e’/=eccentricity constant 
I'‘Sinclination constant 


n, =Vpwa? 


Rg =Earth’s equatorial radius 


and notations, 


Vem Vimy oa Val ¥ y=¥sN° Y's=Y=e °° 


Brouwer’s formulas for computing the perturbations are:’ 


*In order to avoid confusion over notation, let 6,, 4,, 
and 6, represent the secular, long period, and short period 
variations, respectively. 


a We 





Secular terms 


8,1= 5 7/m(-1+30") 
tore oll [-15+16n+25n?+ (30-96N-90N?2) 62+ (105+14n+25n2) 04] 
1a /12 2 4 
+—~ y' ne’’* (3-3007+350%) 
16° * (2.10) 


8,951; (-15 Us) 


+ oy 7-35 +24 +2507 + (90-192n-126n?) 62+ (385+360N+45n?) 04] 
+2. y/,[21-9n2+ (-270+126n?) 62+ (385-189N?) 64] 
16 (2.11) 


§.h=-3y',0+2>y', (-5 +121 + 91/7) G+ (-35=36n-51/7) 97] 

8 (2.12) 
5 s 
+37, (5-3n’) (3-707) 0 


Long period terms 
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(2.15) 


(2.16) 


(2217) 


Short period terms 
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(2.18) 


(2.19) 


(2.20) 


(2.21) 


(2.22) 


/ 
§,n=- +2016 (om! Sie +e/'’sinf’) a Ss 1 (Dee +2f') (2 xe) 
+3e//sin(2g'+f') +e//sin(2g9/+3f’))} 


Using Equations 2.10 through 2.23, Brouwer’s algorithm 
for predicting a satellite’s state vector is as follows: 
¢ Begin with mean elements a", e", I", 1,", g)", and h,". 


e Form mean motion ny. 


n=Vpa™™ (2.24) 


e Propagate "mean" mean anomaly 1", mean argument of perigee 
g", and mean ascending node h" using Equations 2.10 - 


wee . 
a Se GeO) ) 
g=g,"+n, td .g (2.25) 
h"=h,+n,t6 
meemoly long periodic corrections to 1", g", and h" using 
Equations 2.15 - 2.17 and compute the long period 
variations 6,e and 6,I using Equations 2.13 and 2.14. 
1/=1"%+8,1 
g'=g"+8.g (2.26) 
h’=h"+8,h 


* Solve Kepler’s Equation for the eccentric anomaly E’, 
using 1’ and e" and compute the true anomaly, f’ and 
Baatus, or’ . 


E’-e"sinkE/=1"" 


a (1+e”) E/ 
me a"(1-e7) 
1+e"cosf! 


Zeapmiy Snore Pperlod Variations to 1’, g’, kh’, a", e", and 
PeicinGg quatwons 2.1865> 2.23. 
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1=1'+6,1 
g=g'+8,g 
h=h!/+8,h 
a=a"+§,a 
e=-e"+5, e+6,e 
T=I"+6,I+851 


(2528) 


° Solve Kepler’s Equation for E, using e and 1 and compute 


f andor. 
F-esinE£=1 
i (1+e) E 
tan—=,| ————- tan— 
2 (TES) 2 (2.29) 
wa aew 
1+ecosf 


¢e Compute the position vector in the conventional manner. 


x=rlcos(g+f) cosh-sin(g+f£) sinhcosTI] 
y=rl(cos(g+f) sinh+sin(g+f) coshcosI] (2.30) 
z=rsin(g+f)sinlI 


In addition to accounting for perturbations only due 
the Barth’s oblateness and asymmetry about the equatorial 
axis, this model has several shortcomings which Brouwer 
addressed in his article. The first is a singularity at 
critical inclinat ron (I.=cos?(1/V5) =63.4°) for the long period 
corrections since many of the terms have a divisor of 
(Scos*I-1). Second is a singularity for very small 
eccentricities (near circular orbits). This singularity is 
due to the appearance of e" as a divisor in the short period 
terms. Finally, there exists a singularity in some of the 


elements for very small inclinations (orbits lying in the 
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equatorial plane). Brouwer suggested that although 
Singularities in some of the elements existed for very small 
eccentricities and inclinations, no such singularity existed 
in the coordinates. Hence, the formulas could be modified and 
expressions obtained for the perturbations in coordinates for 
these special cases. (Brouwer,1959, p. 393) 

2. lLydanne’s Modifications 

Lydanne’s modifications correct for the singularities 
in Brouwer’s model due to the very small eccentricities and 
inclinations. He presented the suggested modifications in an 
article in the Astronomical Journal in 1963. 

As Brouwer suggested in his article, lLydanne and 
several other investigators believed there existed well- 
determined expressions for the coordinates of a satellite in 
the case of either very small eccentricity or inclination, 
because no singularity actually existed in the coordinates for 
the small eccentricity or inclination. However, lLydanne 
encountered difficulty in applying the approach suggested by 
Brouwer. This aissrenen seqeteees the Taylor series expansion 
of the coordinates in the element perturbation. Although the 
first-order terms were regular, the higher order terms were 
Singular. (Lyddane, 1963, p. 555) 

Lyddane suggested another approach. By formulating 
the perturbation theory in terms of Poincare’ variables 


instead of Delaunay variables, the singularities can be 


1g 


avoided. The Poincare’ variables are defined in the terms of 


the Delaunay variables as follows: 


X,=L Via 
x,=/2 (£-G) cos(gth) y,=-/2(Z£-G) sin(g+h) (2, 32) 
x,=/(G-H) cosh y¥,=-V2 (G-H) sink 


Using a method similar to Brouwer’s method of 
solution, Lyddane presented with some algebraic manipulation 


the following formulas: 


az=a"+8a > 
l+gth=l"+g"+h"+8 (1l+g+h) (2.32) 


ecosl=(e/’+$e) cosl’’-e’’dlsin1” (2.33) 
esinl=(e/’+d$e) sinl’’+e’/$lcos1” 


or 


sin (2) combs (ein 03 
eae // 
gy — 
= oe or h’/ , 
sin (>) sinh= Lain (age cos(4~) ST) sin 


+sin (5) dhcosh’’ 


where 

6 =6, +6, 
Additionally, Lydanne discovered that the use of 1" instead of 
1’ in Equations 2.21 and 2.22 introduces an error of daemmece 
order O(k,*). Since Brouwer model computes long period terms 


to only to order O(k,), Lyddane suggested that” the™ shez 


period corrections may be computed using 1" instead of 1’ 
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(Lydanne, 1963, p. 557). Lyddane claimed this approach would 
overcome the e"=Q and I"=0 singularities in the periodic 
variations. His approach removed the singularity e"=0 for all 
terms and the singularity at I"=0 for all terms except 6,I 
(Equation 2.14) (Solomon, 1991, p. 8). 

Lyddane’s algorithm for propagating an element set is 


as follows: 


¢ Compute 1", h", de, e"d1, and dI using Brouwer’s formulas 
Mecquations 2.10 — 2.23). To avoid a small difference 
between two large quantities, use the following identities 
in Equation 2.19 to compute de. 


4 all a ee a i ell 
Se a es 1 P tS 
PCOcr tie GOs +e COS fh” 
sr) (2.35) 
(2) (( 20) -yty=+ (e308 £" 
enh Nae 


+3e"cos? F"+e!@cos3 fF") 


¢e Compute a and l+g+th using Equation Set 2.32. To reduce 
amount of error introduced by finite precision, combine 
Equations 2.15 — 2.17 for 6,(l+gth) and Equations 2.21 - 
2.23 for 6,(l+gth) . 


* Compute sin (¥%I") oh. 


; ey _ sinr”dh 
sin (——) 6h=————_— (2.36) 


// 
BEOs | a 
oe 


Solve for e, I, 1, and h using Equation Sets 2.33 and 
wee34. 


¢ Compute coordinates and velocity components in the usual 
manner. 


ie) 


coshcos (g+f) -sinnhsin(g+£) cost 
f=|sinhcos (g+f) +coshsin (g+f) cost 
sin(g+f) sinI 


-coshsin(g+f) -sinheos (g-=£) cost 


G=|-sinhsin(g+f)+coshcos (g+f) cosI (2.37) 


cos(g+f) sainlI 


g- [(esinf) £+(l+ecosf) ¥] 


nva 


3.  NAVSPASUR Modifications 
a. Atmospheric Drag 

Brouwer’s model considers only the perturbations 
due to zonal harmonics (BEarth’s oblateness and asymmetry about 
the equatorial axis). For near-earth orbits, the magnitude of 
the perturbative acceleration due to atmospheric drag can be 
of the same order as the magnitude of the perturbative 
acceleration due of the earth’s oblateness (Knowles, 1992, p. 
226))- Figure 2.2 shows the relative orders of magnitude of 
the perturbative accelerations at various altitudes above the 
Earth. Fluctuations in the density and relative velocity of 
the atmosphere to the satellite make the perturbative 
acceleration due to atmospheric drag difficult to model. To 
compensate for the drag perturbation, the NAVSPASUR model 
utilizes a simplified sub-model for drag (Solomon, 1991, pp. 
9 - 10). Atmospheric drag is modeled by time derivatives of 


the "mean" mean anomaly, 1", using two parameters M, and M,;: 
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note: typical satellite 
charactenstics 

assumed for effects such as 
drag, radiation pressure, atc. 
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Figure 2.2 Perturbative Accelerations on Satellites 


a 


Geosynchronous 


(2.38) 


law Uf 
TI eS etecclik te ark te 
where t is the elapsed time from epoch Using the following 


relationships, 
m=n,(1+6.1) =n, 


and 
1 ee 


(2.39) 


1t can be assumed 
a/P xm -2 


Differentiating Equation 2.39 and working to first order, 
(2.40) 


Using m=2M, from Equation 2.38 and substituting into Equation 


2.40 yields following model for the drag effect on the semi- 








major axis: 
4a I 
a=- M, (2 5 4l) 
3m 
Assuming a spherically symmetric atmosphere with an 
exponential density variation 
Ve & 
o (r) =p,exp (- Tes (2.42) 


where p, is the density at radius r=r, and H is the scale 
height, the rates of change in the semi-axis, a, and 
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eccentricity, e, with respect to the eccentric anomaly, E, may 


be expressed as (Danby, 1988, pp. 330 - 331): 


da__s -5p,exp ( A4® cose) ‘ ERR OOSEE (1+e@CcoOSsE) 

dE H Fa eCose (2.43) 
de Aae 1+ecosE 

—=-an6ép,ex eeon) y) —— Cosh 

BE) ioc oar: x 1-ecosE 


where 0 and A are constants. Estimating the average change in 








a and e by integrating Equation Set 2.43 over one orbit to 


lowest order of e yields: 


Ae_en’, (2.44) 
Aa a 


Assuming O=1 and substituting Equation 2.41 into 2.44, the 


model for the decay in eccentricity is: 


i eae HiRes 
~e Ta__ 4e°n’ M, (2.45) 
all 3m 


b. Near Critical Inclination 
Neither Brouwer nor Lyddane were able to correct 
for the singularity in the long period terms at the critical 
inclination (I.=cos(1/V5) =63.4°). To prevent an overflow 
error in the subroutine PPT2, the factor (1-Scos*I")~ is 


approximated by the function 


: _ 2 
T2- Lrexp(-100x") (2.46) 
x 
where x=1-5cos’I". However, with a small value of x, T2 


cannot be computed directly. By using a product expansion of 


Equation 2.46, 


Ja. 


10 
T2=— (1-exp(-Bx*)) [] (1+exp (-27.x?) ) (2nanm 
x 


n=0 


where B=100/2?'. To remove the factor of x from the 


denominator, the first factor 1s approximated by 


(1 -exp(=px-)) — — Z Pen ae Seal 2.48 
a Bx) (-1) ea ( ) 


Figure 2.3 shows a comparison of T2 using Equations 2.47 and 
2.48 and 1/x in vicinity of the critical inelvnaetea 


(Solomon, 1991 ppl 0 = ee) 


Ce PPT2 
PPT2 is the Fortran subroutine which implements Brouwer’s 
model with Lyddane’s and NAVSPASUR’s modifications. PPT2 
completes several satellite propagation tasks. 
e Predicts a satellite state vector at future time. 
¢ Computes partial derivatives of the position vector with 
respect to the orbital elements (used in Method of 
Differential Correction to modify set of stored elements 


in light of current observations). 


e Predicts the time and state vector of satellite for a 
given true anomaly. 


PPT2 is composed of sections which accomplish the tasks 
named above. The sections are delineated by conditional 
breakpoints. Control over which section to execute is handled 
by a set of control variables. Data is passed to the 
subroutine through three control variables and four Common 


blocks. A complete description of the input/output of PPT2 is 
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Figure 2.3 Near Critical Inclination 


contained in Appendix A. 

NAVSPASUR represents each tracked object by a set of 
stored elements. The stored elements are the mean orbital 
elements plus two drag parameters, M, and M,. The stored 
orbital elements are the same as the classical orbital 
elements used in Brouwer’s model with two exceptions. The 
mean semi-major axis, a", is replaced by the variable, m, the 


mean "mean" motion. 


25 


m= (1+6,1) n,=(1+6,1) al!-?”? (2.49) 
The other exception is that the cosine of the mean 
inclination, cos I", is stored instead of the mean 
inclination, I". Thus the stored elements are: 1", m, M,, M;, 
e", g", h", andleos ss. 

For NAVSPASUR to use the formulas from Brouwer’s model, 
the mean semi-major axis, a", must be determined from the mean 
"mean" motion, m. Since a" is implied in the 6,1 (Equation 
2-10), no Girect Scluticnmere et pcaccible. Therefore, PPT2 
recovers a" using an initial approximation for a" and solving 


Equation 2.48 for a" iteratively. The initial guess for a" is 


a,'=m" 2) Then, “for — 
Y’ 2 
2i ei, 4 
a7 
| as 2 
4i 1] = aane 
aa N 


(0,1) San (=2 302) 


— /M[-15+16n+25n? 


+ (30-96n-90n?) 02+ (105+14n+25n?) 0%] 
+527 aMe!!? (3-300? +358") 


(2.50) 


and the mean semi-major axis, a"), 18 taken to be equal to a," 
(Solomon, 1991, a5 -).1 >). 
Another difference between the model and actual 


computations by PPT2 is the computation of the secular 


Zi 


corrections for the mean argument of perigee, g", and mean 
ascending node, h". The secular corrections for g" and h" are 
computed in terms of mean anomaly, 1", instead of time using 


d.g and d,h from Equations 2.11 and 2.12. 








Ag ag” 
aq ede = deme 0. (OL7 (2.51) 
‘dl’ di’  m  ™~ mas 
dt 
dh’’ dh’’ 
dh’! _ dt z dt _1,0,h _ 6h (2 52) 
di’ div  m ~ m~ ma??? | 
dt 


Once the secular corrections to the "mean" mean anomaly are 
computed via Equation 2.38, Equations 2.51 and 2.52 are used 


memeorrect g" and h". 


dl (2.53) 


where 


Al=mt+M,t?+M,t? 

With all of its conditional breakpoints, PPT2 completes 
only those tasks required by the user. Prior to completing 
any of the fore-mentioned tasks, the user is required to make 
an initial call to subroutine PPT2 to recover a" and compute 
the secular corrections. During this initial call, many 


variables are set which will be used in subsequent calls, 


Za, 


increasing efficiency. Therefore, at least two calls to PPT2 
are required to complete any of its tasks.’ 

Because the main objective of this thesis is the 
parallelization of the satellite state vector prediction task 
of PPT2, only this algorithm will be presented. For a 
complete description of the other tasks see (Solomon, 1991) .‘ 
The algorithm implementing the NAVSPASUR model is as follows: 


e Begin with the stored mean elements plus the drag 
correction terms (1,", m, M,, M3, €o", Go", No", andueocmias 


e Compute T2 using Equations 2.47 and 2.48. 


e Recover mean semi-major axis, a,", from the mean motion, 
m, by iteration using Equation Set 2.50. 


e Compute the following dimensionless quantities: 
k, Ay; 6 k, = As 6 


Lierrrem Mer Cmmyri US me (2.54) 
Yo-Yo OY G=YN 4 eX ee 


° Compute drag corrections for the semi-major axis, a", and 
eccentricity, e", using Equations 2.41 and 2.45. 


e Using Equation 2.38 and Equation Set 2.53, propagate the 
mean anomaly, 1", argument of perigee, g", and the 
ascending node, lover considering only the secular 
corrections. (h" may be optionally corrected for the 
Earth’s rotation using Equation 2.56, where @ is the 
EFarth’s angular velocity and T, is the time at which the 
direction of the Greenwich meridian and equinox direction 
coincides. 


*For a complete description of calling options of PPT2, 
see (Solomon, 1991,7 pp. 1i¥a 7244) 


‘A complete listing of subroutine PPT2 is contained in 
(Solomon, 199) Sere ocoe ao oe 
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Lal +Al 
x dg 
J =Iq'' + = Al (2.55) 
nahi + ay 
ai 


h’=h’’-@(t-T.) [optional] (2.56) 


¢ Propagate the eccentricity, e", and the semi-major axis, 
a”. 


e’’=min[max[0,e,/+ét], .99999] 


Piast | 
a’’=max[1, a,//+at] ( 


e Solve Kepler’s Equation using Steffensen’s Method and use 
this result to compute cosine and sine of the true anomaly 
and radius. 


E'l21]'/+e"' sing” 


cosE"’-e”’ 


Oo —————— : 
l-e [Goce | (2 58) 
sinf'= yl-e/’? Sine? 


1-e’’cosE’’ 
IIa 
Ae a’’T} 


a | 
L+e//+cosf!'’ 


* Using Equations 2.13-2.17, compute long period correction 
terms for e, 1, h, and I in the following forms replacing 
g’ and (1-507)7* with g" and T2, respectively: 

§,e=VLElcos2g''+VLE2sing’’+VLE3sin3g"’ 
e//$,1= (VLE1sin2g"’ -VLL2cosg’! -VLE3cos3q'") 
sinI’’$,h=VLH1Isin2g//+VLH2Icosg’’+VLH3Icos3g/’ (2.59) 


e//d,e 
OS 
Ti*t ant’ 


where 


Zo 


SY 4 


VLEI=e!’N? (= y/,[1-1167-400% T2] -— “— -30?-80*- T2]) 








g2- VW sini" | / Y's 
ay’, ee: 
35 Y's 

3a4y,, 

By: 
a 


(4+3e//*) [1-90?-240*- 72] ) 


VLE3=- e//*n?sinl’’ [1-567-166*: T2] 


iL 
VLL2=VLE2+ el ’sinI’’ [1-907-2404: T2] 
Z 


VLH1I=e’/*@sinI’”’ (-~2 : 


/ 


° > Y [3+160*- T2+409*- T27]) 
a 


[di +8098*> TZ. 200G aT 2. 











VLH2I=— mt 3° : = 2 (4232 —) (1-90-2407) 
8 
= (4+3e//7) [3+1602- 72+400* 722]] 
35y/ 
VLH3I=-——! § e/@{+ [1-507-1604- T2] 
Oye Ya 


+sin?I’/ [5+320?- 72+800*: T27]} 


* Representing the quantity (1+g+h) by 72, compute 
6,2=6,1+6,g+6,h by combining individual parts prior to 
computing. 


6, Z=VLS1sin2g'!+VLS2cosg''+VLS3cos3q"' (2.60) 


where 


= (nN?-1) / = 2— 4. _ 10y", es 25 a7 
VLS1 —q— [¥'2(2 110*-400*%- T2) Sy 307-80: T2) } 


2 
a Ly’, (11+8002- 72+2000* 722) -y', (3+160?: T2+400* T2) } 
Y's 


+256 (0°32 ye ~——=) -2 
-y', (1-907-4004- 72) 


Ly’, (1-3307-2000*- T2) 
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wt 








VLS2= el’ sinz!( = [ (n+ cH ee) 17 oes) 904-240 72) } 
1201's (ett age [1- 902-2404: T2] 
64y', 
+ me 5@(1-0) [(4+3e/’") (3+160?- T2+4004- 727) 3} 
BZ’, 
Wee sint{-oe [3n?-3- (2+ 0 )e’/7) [1-502-1604- 72] 
NIIP 1+8 
357"; 2 
- > [e//78 (1-0) (5+406?: 72+8004- T2?) }} 
BOY’. 


° Using Equation 2.19 and Equation Set 2.35, compute the 
short period correction term for e, replacing g’ and f’ 
with g" and £", respectively. Then combine long and short 
period corrections to form de. 


/ 
°) e= 5 2{(-1+367) (e!’cos?f'!+3e!'cos?f!! 


Z 


+3cosf'/+e"’ (H+ 





1 
ivy?! (2.61) 


+3 (1-07) [e’/*cos? f/' +3e/'cos?f”’ 
+3cosf'’+e’'cos (2g/'+2f"’) ] 
-1)? (1-07) [3e/’cos (2g//+f'’) +e’/cos (2g//+3f'’)} 


de=5,e+5.,e (2.62) 


e Using Equations 2.20 and 2.21, compute the short period 
correction terms for I and 1 in the following form 
replacing g’ and f’ with g" and £", respectively. Then 
combine respective long and short period terms to form $I 
and edl. 


cul 





/ 
6,I= = 8sinI” [3cos (2g//+2f'’) +3e//cos (2g!// +f") 
+e/' cos (2g/'+3f"") J 


3ay/ 
e/'$,1=- I ={2 (-1+36?) 





// ets // // 
x [ (l+e’’ cosf'’) (2+e'' cosf'’) Al) oaaya! (2.63) 
+3 (1-07) 
ai Hd // // 
x [ (ie (l+e’’cosf’’) (2+e°' cosf'’) )sin(2g//+f!’) 


N 
ey (l+e’’cosf’'’) (2+e!'’cosf'’) +3) 


1 
xX Sinica om Je 


OI=0,1+0,2 . 
SRE ee, i (2 ee) 
e Using equation 2.23, compute short period correction for 
h in the following form. 
y’ 
sinI''$,h=-—sinI''0[6 (ey a eee) 
=“38i1nN (2g “Ze ja3e Sini2o 2.) 
+e/'sin(29/'+3f"") ] 


(2.65) 


* Using Equation 2.36 and relationship dh=6,h+d,h, compute 
sin (¥%I") dh. 
i sinI’’ (6,h+6,h) 
sin(=-) 0 b= > + (2. 66) 
2cos (=) 


* Combine terms of $,z=6,1+6,g+65,h, replacing g’, 1’, and f’ 





with g", 1", £", respectively. Then compute zZ. 
1 
(q+ -1) 
§,z=-e//78,1 
ne 
y/ (2.67) 
— [6 (1-20-5067) (£1 -e”’ sinf’’-1") 


~ (3+20-507) (3sin(2g’/+2f"") 
+3e"'sin(2g +Z2f je Sinkizg se a) 


Be 


z=l!/+g!' +h!'/+8,2+8,z (2.68) 


Compute a using Equation 2.18. 
a=a’’+d,a (2.69) 


serve Equation Sets 2.33 and 2.34 explicitly for e, 1, I, 


and h. 
e=y (de) 7+(ed1)? 
Perc? ( desinl’’ tedlcosl”’ 
Secosl’’-edlsinl” 
cosI=1-2[ (62)*+(sin (>) 5h)?] (2.70) 
O6Isinh’’+sin i= dhcosh’’ 
De Cane (ee ee 
d6Icosh’’/-sin eS dhsinh’’ 
Compute g. 


g=z-l-h (2.71) 


Solve Kepler’s Equation again using Steffensen’s Method 
aoc computcemcos £, sin f, and rx. 


E=l+esinE 


cosE-e 


1-ecosE 
Vl-e*? sinE 
1-ecosE 
aT’ 


jap ee 
l+e+cosf 


cosf= 
(2.72) 


Sas 


Compute the satellite state vector using Equation set 
Tp oe ior 
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III. PARALLEL COMPUTING 


A. OVERVIEW 
1. Definition 

The complexity of scientific computing today demands 
faster computers. Greater detailed models require a 
substantial amount of computation. Faster computers are 
needed to provide the results of the computation in a timely 
manner. In response to this demand, computer engineers have 
taken two approaches to achieve faster performance. 

The first approach is to increase the speed of the 
Clireuleny. Although great advances have been made in 
increasing the speed of computer circuitry, this increase in 
speed is bounded by the speed of light. Additionally, the 
specific design and manufacture requirements for further 
increases in speed are quite costly. 

The second approach, parallel computing, provides an 
alternate means to achieve faster computer performance using 
affordable circuitry design: Many articles and books have 
been written describing the methods to exploit this approach. 
The terms parallel computing and parallel processing seem to 
be used interchangeably in these texts. For the purpose of 
this thesis, parallel computing and parallel processing are 


assumed to be synonymous. In this emerging field, there 
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exists slight differences in how to define parallel computing. 
One definition which best encompasses the breadth of the field 
may be taken from (Hwang, 1984, p. 6): 


Parallel processing is an efficient form of information 
processing which emphasizes the exploitation of concurrent 
events in the computing process. Concurrency implies 
parallelism, simultaneity, and pipelining. Parallel 
events may occur in multiple resources during the same 
time interval; simultaneous events may occur at the same 
time instant; and pipelined events may occur in overlapped 
time spans. These concurrent events are attainable ina 
computer system at various processing levels. 


In his book, Hwang describes the processing levels to 


be: the program level, the task level, the inter-instruction 


level, and the intra-instruction level. The program level 
involves executing multiple programs by means ob 
multiprogramming, time sharing, and multiprocessing. This 


level is concerned with the design of parallel processing 
systems which is beyond the scope of this thesis. Therefore, 
for the purposes of this paper, the definition of parallel 
computing is defined as the efficient form of information 
processing emphasizing the concurrent computations’) and 
manipulation of data to solve a single problem. 
2. Classification of Parallel Computers 
a. Type Classifications 

Implicit in the definition of parallel computing 
are three methods to achieve parallelism. The three methods 
are temporal parallelism, spatial parallelism, and 


asynchronous parallelism (Hwang, 1984, p. 20). These methods 
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offer a manner to classify the various types of parallel 
computers. 

The first type is a pipeline computer. Pipeline 
computers perform overlapped computations to exploit temporal 
parallelism. Computations are divided into a number of stages 
or segments with the output of one segment being the input of 
another. Analogous to a factory assembly line, if each 
segment works at same speed, the work rate of the pipeline is 
the sum of work rates of the segments. The maximum work rate 
is achieved once the pipeline is full. An example of a 
pipeline computer is the Cray-l. 

The second type iS an array processor. Array 
processors use multiple synchronized processing elements to 
achieve spatial parallelism. Each processing element performs 
Simultaneously identical operations on different data. An 
example of an array processor is the Connection Machine. 

The tina rd type is a multiprocessor. 
Multiprocessors may achieve asynchronous parallelism through 
a set of interactive processors (nodes). These pLOce sea are 
capable of performing independent operations, but share 
resources such as memory. An example of a multiprocessor is 
the Cm* of Carnegie-Mellon University. 

The final type is a refinement of the 
multiprocessor, the multicomputer. Multicomputers, like 
multiprocessors, achieve asynchronous parallelism through a 


set of interactive processors. But these processors each have 
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their own local memory. An example of a multicomputer is the 
INTEL iPSC hypercube. Because each processor has its own 
memory and may perform independent operations, multicomputers 
offer the user an added degree of freedom in programming. 
However, interaction between the processors (nodes) may 
require synchronization to be explicitly programmed in the 
multicomputer code. 

The four type classifications are not necessarily 
mutually exclusive. Many commercially available array 
processors, multiprocessors, and multicomputers employ 
pipeline processors to complete operations such as vector 
processing. 

b. Architectural Classifications 

Parallel computers may also be classified 
according to their architecture. One scheme for classifying 
digital computers was introduced by Michael J. Flynn in 1966. 
He introduced a scheme to classify computers into four 
categories based on the multiplicity of instruction and data 
Streams. An instruction stream is a sequence of instructions 
to be executed by the computer. Likewise, a data stream isa 
sequence of data used by the computer. Flynn’s four 
categories are (Flynn, 1966): 

1. Single instruction stream, single data stream (SISD). 
Most serial computers fall in the SISD category. 
Although instructions are completed sequentially, this 
category includes overlapping instructions (pipelining). 


Therefore, pure pipeline processors also belong to this 
category. 
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2. Single instruction stream, multiple data stream (SIMD). 
Array processors fall into this category. The array 
processor receives a single set of instructions, but each 
element receives and manipulates its own set of data. 

3. Multiple instruction stream, single data stream (MISD). 
No current computers fall into this category. This 
architecture has been challenged as impractical by some 
computer designers (Hwang, 1984, p. 34). 

4. Multiple instruction stream, multiple data stream (MIMD). 
Most multiprocessors and multicomputers fall into this 
category. The INTEL iPSC is a MIMD machine. 

c. Topological Classifications 
Another classifying scheme for parallel computers 
1s by the topology of the inter-processor connections. These 
connections are the means through which communication between 
individual processors is conducted. This classifying scheme 
applies only to array processors, multiprocessors, and 
multicomputers. Some of the general topologies are the mesh, 
the pyramid, the butterfly, and the hypercube. Figures 3.1 
and 3.2 show examples of the mesh and hypercube topologies. 
The topology may also be customized to meet specific computing 
needs. For a more comprehensive discussion of the various 
topologies see (Quinn, 1987, pp. 25 - 30). 
3. Measurements of Performance 
With faster computation speed being the ultimate 
objective, certain measures are needed to determine the 
effectiveness of parallel computing versus serial computing to 


achieve this objective. Computation speed depends on many 


factors that include the computer hardware design, the 
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Figure 3.1 Two-dimensional meshes 


technical specifications of its components, and the algorithm 
or method of solution used to complete the computations. Two 
common measures of effectiveness, accounting for both the 
hardware and the algorithm, are speedup and efficiency. 
peeedun mes ne fers CO mche Srattoebetween the time 


taken to execute a set of computations serially, T,, and the 
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Figure 3.2 Hypercubes of dimension zero through four 


time taken to complete the same set of computations exploiting 


para el ronal, 


Si =e (3 
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Although speedup compares the time taken for the serial 
computer program and the parallel computer to complete the 
same set of computations, this set does not imply both 
programs follow the same algorithm. Parallel programs often 
contain additional operations to accommodate parallelism. In 
order not to be misleading, speedup should compare the 
parallel computer program with the most efficient serial 
computer program. Many suggest that times, T, and T,, be 
measured using a particular parallel computer and the fastest 
serial computer. However, the variation in the technical 
specifications of both computers may cloud the issue whether 
parallel processing is beneficial. To be an effective 
measure, the computing technical specifications of the 
individual processor of the parallel computer and the serial 
computer should be equal. Therefore, for the purpose of this 


thesis, speedup, S§S 1s measured by the ratio of time, T,, 


p?’ 
taken by the parallel computer executing the most efficient 


serial algorithm and the time, T,, taken by the same parallel 


pr 


computer executing the parallel algorithm using p processors. 


ye (3.2) 


The other measure, efficiency, accounts for the 
relative cost of achieving a specific speedup. Relative cost 
is measured as the number of processors required to achieve 


the speedup. Efficiency, E is the ratio between the 


pf 
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speedup, S,, and the number of processors, p (the theoretical 


p/ 


speedup). 


E =f (3.3) 


Many factors could possibly limit the possible speedup 
and efficiency of a parallel program. These factors include 
the number of sequential operations that cannot be 
parallelized, the communication time between individual 
processors, and the time each processor is idle due to 
synchronization requirements. Many have argued these factors 
severely restrict the benefits of parallel computing. Despite 
these factors, research has shown parallel computing can be an 
effective means to reduce computation time (see Quinn, 1987, 
pp. 18 - 20 and Gustafson, 1988). Considering only the number 
of sequential operations in a program that cannot be 
parallelized, Amdal’s Law states that the maximum speedup, S,, 


achievable by p processors is: 


1 


= .4 
Se <SI-FV7B ——_ 


where f is the fraction of operations that must be performed 
sequentially (Amdahl, 1967, pp. 483 - 485). Equation 3.3 
provides an initial means to determine if an algorithm is a 


good candidate for parallelization. 
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Be INTEL iPSC/2 HYPERCUBE 

To maximize speedup and efficiency, parallel algorithms 
must be developed with a specific parallel computer in mind. 
In determining the parallel computing potential of the 
NAVSPASUR satellite motion model, an INTEL iPSC/2 hypercube 
computer, located at the Department of Mathematics at the 
Naval Postgraduate School, was used. The iPSC/2 is a MIMD 
multicomputer with a hypercube topology. The iPSC/2 consists 
of a system resource manager and eight individual processors, 
called computing nodes. The system resource manager, often 
called the host, provides the interface between the user and 
the computing nodes. The host is a 386-based computer, which 
may be used to process data in addition to providing the 
interface for the user. 

The computing nodes are complete, self-contained INTEL 
80386 microprocessors. Each computing node also contains a 
80387 numeric coprocessor, its own local memory, and a Direct- 
Connect communications module (DCM). Each computing node may 
be augmented by a Vector Extension (VX) module for pipelined 
vector operations. The iPSC/2 located at the Naval 
Postgraduate School contains only one node with the VX module. 

Communications among the nodes and the host are completed 
through message passing. The Direct-Connect Module (DCM) 
allows messages to be sent directly to the receiving node 
without disturbing the other node processors. Other hypercube 


designs require messages to be stored and forwarded along a 
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path of connected nodes until the message reached the 
receiving node. 

The iPSC/2 uses a UNIX operating system and may be 
programmed in Fortran and C languages. A more detailed 
It SEing of the INTEL 1PSC/2 hypercube’ s technical 


specifications is contained in Appendix B. 


C. METHODS OF PARALLELIZATION 
1. Vectorization 
Vectorization is one method to parallelize an existing 
sequential program. Vectorization 1s the process of converting 
blocks of sequential operations into vector instructions that 
may be pipelined. A simple example of vectorization using 


Fortran is the following: 


Sequential Code: 
Do 10 i=1,N 
10.2 ee Oey) 
Vector Code (VAST2): 


Call Vada(n x70, 1) 27) 


To assist in the vectorization of a serial program, there 
exist many commercially-available vectorizing compilers. 
(Oulnn;. "1997 - ppae2e 2 a Zo) 

Vectorizing compilers automatically vectorize 


sequential program code for execution. Additionally, they may 
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identify to the user program constructs and data dependencies 
that limit potential vectorization. Vectorization cannot be 
maximized solely by a compiler. Most vectorizing compilers 
have a limited ability in recognizing sequential blocks to be 
vectorized and translations may not be always’) straight 
forward. INTEL iPSC/2 contains the vectorizing compiler, 
VAST2. The VAST2 complier supports only Fortran programs and 
is limited to vectorizing only do loops and if statements. 
(1PSC/2 VAST2 User’s Guide, 1989) 

2. Dastributing Computations 

By parallelizing tasks on individual processors, 
Vectorization provides only the first level of parallelism. 
In order to partition a program into parallel tasks to 
distribute among the processors of a multi-computer, a 
different strategy is needed. Although there exist many 
commercially-available vectorizing compilers, compilers which 
identify higher levels of parallelism have not been as 
successful. Therefore, the task of developing an algorithm to 
efficiently distribute computations among several processors 
is left to the user. 

Performance of parallel algorithms may be radically 
different for different parallel computers. A number of 
factors such aS processor speed, memory access time, and 
memory capacity can affect an algorithm’s performance. Hence, 


the strategy to parallelize an algorithm must be developed 
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with a specific parallel computer inmind. The multicomputer 
with each node having its own memory provides the greatest 
flexibility to the user. For a multicomputer, the user must 
partition the problem among the processor nodes. The 
hypercube topology allows the user to use the natural topology 
of the problem to decompose the problem into parallel 
processes. A process is defined as a single statement or a 
group of statements which are a self-contained portion of the 
total computations. Using the INTEL iPSC/2, two decomposition 
strategies are suggested (iPSC/2 User’s Guide, 1990. pp. 4-1 - 
4-6): 
¢ Control Decomposition 
¢ Domain Decomposition 
a. Control Decomposition 

Control decomposition is the strategy of dividing 
tasks or processes among the individual processors (nodes). 
This strategy incorporates a divide and conquer approach. 
Control decomposition is recommended for problems with 
irregular data structures or unpredictable control flows. 

One method of control decomposition is for the 
parallel program to self~-schedule tasks. For this method one 
node assumes the role of a manager with the remaining nodes 
assuming roles as workers. The managing node maintains a list 
of processes to be accomplished and assigns a processes to the 


working nodes. The working nodes request jobs, receive 
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processes, and perform the indicated tasks. Implied in the 
self-scheduling method is the cost of one processor to perform 
the manager duties. (iPSC/2 User’s Guide, 1990, p. 4-4) 

A second method of control decomposition is to 
pre-schedule the processes. The exact tasks required of each 
node are explicitly stated in the parallel program. Although 
this method saves the cost of the managing node, care must be 


taken to ensure the processes are evenly distributed among the 


macaes . 
b. Domain Decomposition 
Domain decomposition is the strategy of dividing 
the input data or domain among the nodes. The partitioned 


sets of domain may be specific data sets such as blocks of 
matrix or represent a specific grid such as used in finite 
difference or finite element methods to solve partial 
differential equations. The major difference between control 
and domain composition is that domain decomposition strategy 
requires each node to perform essentially the same tasks but 
with different input data. 

Domain decomposition is recommended if the 
calculations are based on a large data structure and the 
amount of work is the same for each node. An example of 
domain decomposition is multiplying two large matrices by 
block multiplication. Although domain decomposition may seem 


perfectly parallelizable and thereby very efficient, user must 
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use caution to ensure each input data set requires essentially 
the same amount of work. 
3. Improving Performance 
The decomposition of a problem may require the use of 
the control, domain or a hybrid of both strategies to be 
efficient. Once a specific strategy 1s chosen, several 
factors should be considered to improve the performance of the 
parallel algorithm. Those factors include: 
¢e Load balance 
° Communication to computation ratio 
e Sequential bottlenecks 
a. Load Balance 
Load balance refers to the degree to which all 
nodes are active. If the work is not evenly distributed among 
the nodes, the parallel algorithm will show constrained 
speedup. Load balancing may be achieved by decreasing the 
grain size of the parallel tasks, self-scheduling tasks, or 
redistributing the domain. Grain size refers to the relative 
amount of work completed in parallel. Pipelined vector 
operations is an example of small grain parallel computing and 
distributing computations may be considered as large grain 
parallel computing. 
b. Communication to computation ratio 
Communications to computation ratio is the ratio 


between the time spent communicating and the time spent 
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computing. Except for perfectly parallel problems, time lost 
for communications is inherent in parallel algorithms. A 
large communication to computation ratio constrains a parallel 
program’s performance. The objective is to maximize the time 
a node spends computing and to minimize the time spent 
communicating. Reductions in the communication to computation 
ratio may be accomplished by increasing the grain size, 
grouping messages, or recalculating values instead of 
receiving the value from another node. 
c. Sequential Bottlenecks 

Sometimes tasks cannot begin until completion of 
a previous task, limiting number of tasks that can be 
completed in parallel. A sequential bottleneck is the 
circumstance of other processors waiting for another processor 
to complete a task before they may continue. The portion of 
operations that are not completed in parallel can 
substantially restrict speedup as can be seen by Amdahl’s Law 
(Equation 3.3). Inherent in sequential bottlenecks are any 
requirements of the nodes to synchronize. The only method to 
remove sequential bottlenecks is to modify or reorder the 
algorithm in order to overlap sequential code with other 


computations. 
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IV. PARALLELIZATION OF PPT2 

The purpose of this research was to determine the 
potential reduction in computation time for the NAVSPASUR 
satellite motion model through parallel computing. This 
potential may be assessed by determining the relative speedup 
and efficiency of various parallel algorithms employing the 
methods and strategies of parallelization discussed in Chapter 
A Bi) te ee 

As stated in the previous chapter, the strategy for 
developing parallel algorithms depends heavily on the 
architecture and topology of the parallel computer used. Due 
to ease of access and familiarity with the INTEL iPSC/2 
hypercube, the parallel computing potential’“of the NAVSPASUR 
model was assessed with respect to implementing the model on 
this specific multi-computer. Although performance of various 
algorithms may vary somewhat depending on the specific 
parallel computer, it was hoped that some generalizations may 
be made from the application of the NAVSPASUR model to this 


hypercube. 


A. VECTORIZATION 
The first method of parallelization considered for the 
NAVSPASUR model was vectorization. Vectorization is usually 


Simpler than the other methods of parallelization to apply. 
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Additionally, if vectorization proved to be beneficial, it may 
be incorporated with the other parallel computing methods in 
order to realize even greater speedup and efficiency. 

The realized speedup due to vectorization is a function of 
the number of vector operations within a specific algorithm. 
Vector operations in Fortran are usually characterized by do 
loops containing scalar operations performed on each element 
of an array. With each node possessing its own vector co- 
processor (VX module), these do loops may be replaced by 
single calls to canned subroutines. These subroutines utilize 
a vector co-processor to perform pipelined vector operations, 
Significantly reducing computation time. In addition to the 
explicit vector operations within an algorithm, sometimes 
there exist blocks of scalar operations that may easily be 
transformed into vector operations. Scalar operations 
contained within Fortran do loops and logical if statements 
are usually good candidates. 

Analysis of the Fortran subroutine PPT2 shows that the 
current subroutine contains very few explicit or implicit 
vector operations. The only apparent vector operation in the 
satellite state vector prediction portion of PPT2 is the 
computation of the velocity vector at the very end of the 
algorithm. The propagation of the orbital element set 
comprises the majority of the computations. The formulas used 
to propagate the orbital elements, presented in Chapter II, 


may be characterized as lengthy, algebraically-complex, non- 
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linear scalar functions of the mean orbital elements. 
Attempts to transform these formulas into a set of vector 
operations quickly become algebraically overwhelming and a 
successful transformation is highly improbable. 

Likewise, with the exception of the computation of the 
variable T2, the scalar operations contained in the do loops 
and if statements of PPT2 demonstrate limited vectorizing 
potential due to data dependency within the loops and 
statements. Therefore, based on this initial assessment of 
limited vectorizing potential, vectorization was noe 
considered as a viable method to reduce computation time and 


efforts to vectorize PPT2 were pursued no further. 


B. DISTRIBUTION OF COMPUTATIONS 

With vectorization deemed as not a viable method of 
parallel computing for the NAVSPASUR model, any reduction in 
computation time needed to be achieved through the method of 
distributing computations. Both strategies of control 
decomposition and domain decomposition were considered. 

In order to better appraise the potential reduction of 
computation time by implementing each strategy, separate 
parallel algorithms utilizing the different strategies were 
developed and evaluated with respect to the measures of 
speedup and efficiency. Although a combination of both 
strategies may possibly provide the greatest speedup and 


efficiency, the evaluation of separate algorithms implementing 
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the respective strategies exclusively provided a better means 
to determine how the majority of the reduction in computation 
time was achieved. Additionally, once the relative benefit of 
each strategy is determined, it would not be difficult to 
incorporate both algorithms together on the hypercube. 

Using the two distinct strategies, two separate sets of 
programs were developed and evaluated. Each program set 
consists of two Fortran programs to be executed on the INTEL 
iPSC/2. A host (system resource manager) program acquires a 
specified size cube; loads the node program on the processors 
of the attached cube; and, upon completion of the algorithm, 
releases the cube for another application. The node program 
implements the parallel algorithm. Although the host may also 
serve as an additional processor, the parallel algorithms 
utilized only the nodes of the iPSC/2.° 

Program set named P°T-A implements an algorithm using the 
control decomposition strategy and the program set named P°T 
implements an algorithm using the domain decomposition 
strategy. Descriptions of the algorithms and an assessment of 
their respective results are contained in the subsections 


below. 


"The routine used to determined run times for the various 
programs is measured differently for the host and the nodes. 
In order to obtain comparable times to compute speedup and 
efficiency, the actual parallel algorithms utilize only the 
node processors. (iPSC/2 Programmer’ s Reference Manual, 1990, 
free o-174.) 
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1. Control Decomposition -- P°T-4 
The strategy of control decomposition is to reduce the 
NAVSPASUR model’s computation time by the concurrent 
completion of separate tasks (processes) by the individual 
nodes of the hypercube. By reducing the computation time 
necessary to predict each individual satellite’s state vector, 
an overall reduction in computation time to predict state 
vectors for all tracked objects can be achieved. Hence, the 
ultimate objective of the program set, P°T-4, was to reduce 
the computation time for a single object in orbit. 
a. Algorithm 
In order to predict a satellite’s state vector 
considering the secular and periodic correction terms due to 
the zonal harmonics and a correction term for each element due 
to the sectoral harmonics, the NAVSPASUR model requires the 
completion of 55 major tasks. The majority of tasks are 
evaluation of the formulas outlined in Chapter II, some tasks 
are a group of computations such as the group of computations 
necessary to compute the variable T2 or the group of 
computations to solve Kepler’s Equation by Steffensen’s 
Method. 
The first step in partitioning these tasks among 
the nodes was to determine which tasks could be completed 
concurrently. Concurrency was determined by the development 


of a hierarchy of the formulas used by the NAVSPASUR model. 
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Each of the individual tasks were listed with its respective 
required input. Tasks which required output from the 
completion of other tasks were listed below those tasks. 
Tasks which could be executed concurrently were listed on the 
same level of the hierarchy. Figures 4.1 and 4.2 contain an 
extract of this hierarchy. 

From this hierarchy of formulas, the number of 
tasks that could be completed concurrently at each level of 
the hierarchy ranges from 2 to 14. The levels where only a 
few (less than four) represent potential sequential 
bottlenecks. These sequential bottlenecks needed to be 
overcome for a high level of efficiency to be achieved. 

Additionally, the number of FLOPS required varied 
considerably among the tasks. Some tasks required as few as 
2 FLOPS, while other tasks required over 200 FLOPS. For 
example, solving Kepler’s Equation by Steffensen’s Method 
could require as few as 3 FLOPS or as many as 650 FLOPS based 
on the speed of convergence.® This variance in the number of 
FLOPS required by the various tasks presented a potential 
problem in load balancing. 

The second step in applying this strategy was to 
determine the method of scheduling the tasks to be 
accomplished among the nodes. A manager-worker algorithm as 
described in Chapter III provides an easy method to achieve 


‘Tf the error tolerance of less than 107° is not met, the 
NAVSPASUR model halts Steffensen’s Method after 20 iterations. 
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Figure 4.1 Hierarchy of NAVSPASUR Formulas 


load balancing among the nodes. However, for a large number 
of small tasks as is the case with the NAVSPASUR model, this 
type of algorithm can become communication intensive. A large 
communication-to-computation ratio can severely limit speedup 
and could possibly cause a parallel algorithm run longer than 


the original serial algorithm. In an effort to reduce the 
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Figure 4.2 Hierarchy of NAVSPASUR Formulas (continued) 


amount of communication in the algorithm, an algorithm that 
pre-scheduled tasks was chosen. 

With pre~-scheduled algorithms, each node knows its 
own tasks to accomplish without communicating with a "manager" 
node. Additionally, the absence of a "manager" node frees one 


more node to assist in completing the required tasks. Despite 
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these positive points, pre-scheduled algorithms are not 
without their own drawbacks. Pre-scheduling algorithms do not 
provide automatic load balancing as is the case with self- 
scheduling algorithms. Care must be taken to ensure each node 
does essentially the same amount of work. Also, pre- 
scheduling algorithms require a fixed number of nodes. 
Requiring a fixed number of nodes restricts the flexibility of 
algorithm and its potential speedup. 

The third step in applying this strategy was to 
determine the number of nodes for pre-scheduling. Factors in 
determining the number of nodes or cube size include the 
potential requisite speedup, the amount of computation 
completed between communication messages, potential sequential 
bottlenecks, the number of messages required, and efficient 
use of all of the nodes. In an attempt to achieve an 
appreciable speedup, the algorithm was developed to use a 
minimum of four nodes. 

The final step in applying this strategy was 
assigning specific tasks to each node. Load balancing and 
potential synchronization problems were considered in the 
assignment of the tasks to the respective nodes. Often, 
communication distances are also considered in assigning tasks 
to the nodes; however, with such a small number of nodes the 
communication distances were negligible. The diagram in 
Figure 4.3 depicts how the tasks were distributed among the 


four nodes. Some of the smaller initial tasks were duplicated 
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by several nodes in order to limit the amount of communication 
and eliminate potential synchronization problems at the 
beginning of the algorithm. A complete listing of the source 
code for program set, P°T-4, is contained in Appendix C. 

Bb. Assessment 

To establish a baseline to compare the performance 
of the parallel program sets with the serial subroutine PPT2, 
execution times for PPT2 to predict the position of a single 
Satellite and a set of satellites, ranging from 12 to 20736, 
were measured. The measured times were the elapsed time for 
the execution of PPT2 in milliseconds on a single node using 
the node’s internal clock. Using ten different sets of 
Satellite data, the mean execution time for propagating a 
Single satellite was 11.2 milliseconds. The mean execution 
time for PPT2 to propagate 12 to 20736 satellites is depicted 
in Figure 4.4. These mean execution times were used to 
compute the speedup and efficiency of both parallel program 
sets. 

(1) Results. Program set P°T-4 was executed with 
the same sample satellite sets as were used with PPT2. The 
graph in Figure 4.5 shows a comparison of the mean executions 
of P°T-4 and PPT2 for a various number of satellites. As one 
can see, P°T-4 was nearly two times slower than the original 


serial subroutine. 
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Figure 4.3 P°T-4 Algorithm 


For a single satellite, the mean execution time 
for P°T-4 was 23.3 milliseconds as compared to only 11.2 
milliseconds for PPT2. A closer look at where the time is 
spent reveals the shortcomings of this parallel algorithm. 


Table 4.1 shows a comparison of mean execution times for the 


one node executing PPT2 and the four nodes executing P°T-4. 
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Figure 4.4 PPT2 Execution Times 


The times expressed in the table are the mean for the ten 


sample satellites. The communication time, t includes the 


m/f 
time spent sending and receiving messages, plus the time spent 
waiting for messages to arrive. The computing time, t 13 


af 


the time each node spent completing its respective tasks. 
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Figure 4.5 P°T-4 Execution Times 


As seen by the times listed in Table 4.1, two 
problems with the algorithm become evident. PiErse, 
communication time outweighs the actual computation time for 
each node. The causes of the long communication time are 
number of messages required by this specific partition of 


tasks and synchronization problem of nodes waiting to receive 
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Table 4.1 P°T-4 Execution Time Breakdown 
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computed values from other nodes. second and most 
importantly, although the actual computation time was reduced, 
the total execution time of the parallel algorithm is longer 
than the serial algorithm implemented by PPT2. 

(2) Improvements. The major source of the problem 
is the communication to computation ratio. This parallel 
algorithm using four nodes requires 23 messages among the 
nodes. The NAVSPASUR model is not computationally intensive 
enough to offset this amount of communication. To improve 
performance this ratio must be reduced. 

One method to reduce the communication to 
computation ratio is to reduce the amount of communication. 
One way to reduce the number of communications is to 
restructure the partitions. However, other partitions using 


four nodes were analyzed, yet none could significantly reduce 
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the number of messages. One alternative to possibly achieve 
any speedup was to partition the computations among fewer 
nodes. The diagram in Figure 4.6 depicts the distribution of 
tasks for a two node algorithm P°T-2. Although the algorithm 
displayed potential in reducing the total execution time to 
less than PPT2, speedup would be further bounded by two. A 
speedup of two would not outweigh the costs in procuring a 
parallel computer merely for Satellite propagation. 

A second alternative for reducing the 
communication to computation ratio is to somehow increase the 
amount of computation between messages. The amount of 
computation could be increased by computing the intermediate 
values for n satellites in an array and sending the array in 
one message. The communication would remain essentially 
constant and the computation between messages would increase 
by a factor of n. An estimate of this improvement may be made 
for the mean times in Table 4.1 using speedup and efficiency. 


From Chapter III, efficiency is expressed as the following: 


Seep 2 (y) (4.1) 
Dp Soap) 
where 
t(p)—€ (pieen(p) (4.2) 


If E, is the efficiency of the P°T-4 algorithm computing n 


satellites’ values between messages, then 
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Figure 4.6 P°T-2 Algorithm 
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(4.3) 


Semvying Equation 4.1 for t(1) and substituting into Equation 


4.3 yields 
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Be 2) © 


= a (4.4) 
(né (Pp) ere 
Replacing E by 
ms ie (15), 
= A. 
P(t.(p) *t, (py) ae 
and simplifying yields 
a aes (4.6) 
eles) | 





n 
Take the limit as n goes to infinity and the upper bound for 
E. is 


n 


lim £ =_t) 


mom on SE py (4.7) 


Setting p equal to four and using values from Figure 4.1, E, 
is bounded by .48. This implies the maximum speedup of the 
modified algorithm would be bounded by 1.92. Again, the 
benefit of using this strategy is quite limited. 
2. Domain Decomposition -- P°T 

The strategy of domain decomposition is to reduce the 
NAVSPASUR model’s computation time by the concurrent 
computation of several satellites’ state vectors. Each node 
of the hypercube would complete identical tasks on different 
Satellite data sets, simultaneously. Hence, the ultimate 


objective of program set P°T was to reduce the overall 


computation time for several objects in orbit. 
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a. Algorithm 

Unlike the application of the Control 
decomposition strategy, the application of the domain 
decomposition strategy to the NAVSPASUR model was seemingly 
less arduous. First, because each node propagates satellite 
data sets independent of the other nodes, there exists no 
requirement for communication or synchronization among the 
nodes. This lack of communication simplifies the load 
balancing and sequential bottleneck problems present in the 
P°T-4 parallel algorithm. 

Second, because each node may perform. the 
satellite state vector prediction tasks serially, the existing 
subroutine PPT2 may be used with only minor modifications. 
Developing a parallel algorithm for predicting an individual 
satellite’s state vector was a major task for the control 
decomposition strategy. Additionally, by using the existing 
PPT2 code, the other tasks completed by PPT2 may be requested 
by the user using the same control variables as used by the 
original PPT2 subroutine. The P°T-4 program set was 
restricted to only predicting a satellite’s state vector. 

Finally, by using the serial subroutine PPT2, this 
strategy may be reduced to only developing an algorithm to 
distribute the data in a timely manner. Maximum efficiency 
will be achieved if the nodes do not have to wait for 


satellite data to propagate. 


67 


Intuitively, this strategy seems perfectly 
parallelizable. Although the various tasks performed by PPT2 
require different computation times, the total execution time 
for each node will be essentially the same if it is assumed 
that the various tasks are randomly distributed throughout the 
input data sets. The concern for this algorithm was the 
potential sequential bottlenecks at input/output portions of 
the program set. Reading and writing to external files can be 
very time consuming. In addition to the actual time spent 
reading/writing to an external file, a certain amount of time 
is spent to access the file. In order to minimize this time, 
the number of calls to read/write to a file should be 
minimized. 

With the specific iPSC/2 hypercube available, 
input/output is completed sequentially. Each node must 
compete with the other nodes to read and write to external 
files. To minimize time lost to accessing the file cataloging 
the set of satellites, a node was devoted to both the 
reading/distributing of input satellite data and to the 
collecting/writing of the results. The idea of using 7 Single 
node to read the data and a single node to subsequent write 
the output is simple to implement and proved to be fastest 
method to overcome the bottlenecks with the input/output. The 
remaining nodes of the hypercube implement the NAVSPASUR model 
using a slightly modified PPT2. The diagram in Figure 4.5 


depicts how the satellite data is distributed. The cost of 
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using this simple algorithm to distribute and collect the data 
is the loss of two nodes. The only restriction on the size of 
the hypercube required by P°T is that the attached cube must 
contain at least four nodes to achieve any speedup. A 
complete listing of the source code for program set, P°*T, is 


contained in Appendix D. 


b. Assessment 

(1) Results. The graph in Figure 4.8 depicts the 
mean execution time for P°T versus the number of satellites 
propagated using hypercubes of four and eight nodes 
respectively. P°T was successful in reducing the overall 
execution time to propagate several satellites. Table 4.2 
shows the speedup and efficiency of P°T for a various number 
of satellites. As seen in Table 4.2, the speedup achieved 
using all eight nodes of the hypercube was approximately three 
times larger than the speedup achieved using four nodes. With 
this parallel algorithm using six "working" nodes for an eight 
processor hypercube and only two "working" nodes for a four 
processor hypercube, an increase in speedup by approximately 
a factor of three was expected. More notable was the increase 
in efficiency using eight versus four nodes. The efficiency 
increased from .45 to .67. This increase in efficiency 
indicates that P°T applied to a hypercube of greater dimension 


could yield even greater speedup and efficiency. 
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Figure 4.7 P°T Algorithm 


Table 4.2 also indicates that P°T performance 
increased somewhat with an increase in the total number of 
satellites propagated. Because with this parallel algorithm 
the computation to communication ratio does not vary with the 
number of satellites, this small increase in performance must 


be primarily due to the diminishing impact of the algorithm’s 
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Figure 4.8 P°T Execution Times 


overhead on total execution time. This overhead includes one 
additional message containing the total number of satellites 
to propagate from the distributing node to the ers nodes; 
some small computations by working nodes to determine number 
of data sets to receive; and a halting message sent by the 


collecting node to the host once all of the nodes are 


eat 


Table 4.2 P°T Performance (satelite position predi chien) 


eras Number of Satellites a 





finished. Because these additional messages and computations. 
are only completed once in the program, the time cost 
associated with this overhead becomes negligible as the number 
of satellites propagated is increased. The speedup and 
efficiency remained fairly constant for greater than 144 
satellites. 

To estimate the impact of increasing the amount of 
computation on P°fT’s speedup and efficiency, the execution 
time to predict a satellite’s position and compute the partial 
derivatives of the orbital elements was also measured for PPT2 
and PT. These results are summarized in Table 4.3. Both 
speedup and efficiency improved with this increase in 


computation. 


oz 


Table 4.3 P°T Performance (satellite position prediction plus 
computation of partial derivatives) 


Number of Satellites Be Ee 









8 nodes 69 | 
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144 . 68 
[ee te 
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4 nodes ler 1.84 











Li 8 186 .47 
144 


ds dt o2 .46 








(2) Improvements. The performance results of this 
algorithm using only four and eight nodes indicated a 
potential increase in both speedup and efficiency if this 
algorithm could be applied to a hypercube of greater 
dimension. Because the number of working nodes is not fixed 
for this algorithm, P°T could be applied easily to any size 
hypercube with no modifications. The efficiency of the 
algorithm should increase with the cube dimension until the 
time to distribute a separate satellite data to each working 
node exceeds the time required by node to propagate a single 
satellite. Therefore, a possible improvement in the 
algorithm’s performance can be achieved by applying the 


algorithm to an optimal dimension hypercube. 
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Because the hypercube at the Naval Postgraduate 
School is limited to eight nodes, a model was used to estimate 
the optimal hypercube dimension. The total execution time for 
P°T to propagate n satellites with p processors, t(p), can be 


modeled by the following expression: 


t (p) =t,, (Pp) +t,,(p) +€.(p) (4.8) 


where t,,(p) 1s the time the last node must wait to receive its 
first satellite data set, t,,(p) is the total time the last 
node must wait to receive all of its subsequent satellite data 
sets, and t.,(p) is the time for each node to propagate each 
share of the n satellites. 

As described in previous chapter, the iPSC/2 uses 
a Direct-Connect Module (DCM). This module provides an 
essentially constant startup time for a message to be passed 
between two nodes regardless of the length of the message 
path. Hence, the time to send a message between two nodes is 
a function of only the size (number of bytes) of the message. 
Because all messages between the distributing node and working 
nodes are of a constant size (674 bytes), the time of a single 
message between the distributing node and each working node is 
essentially constant. For this algorithm, there are p-2 
working nodes. Denoting the time to send a single message 
between the distributing node and a working node as t,(1), the 


t.i(p) may be modeled by the following: 
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Eevee ae ste (1) =Kp-3)ye (1) (4.9) 


In order to determine t,(1), several experiments were run 
using the specific iPSC/2 located at the Naval Postgraduate 
wemool . The mean value of t,(1) was approximately .693 
milliseconds. 

A working node’s total wait time for subsequent 
satellite data sets, t,,(p), 18S a function of the elapsed time 
for the working node to propagate a single satellite, the 
elapsed time for the distributing node to send a subsequent 
satellite data set to the working node, and the number of 
satellites the working node must propagate. Because the 
distributing node distributes the data while the working nodes 
are computing, the wait time is zero if the subsequent 
satellite data arrives before the working node is ready to 
receive.’ If the subsequent satellite data arrives after the 
node is finished with the previous satellite, the wait time is 
the difference between the computing time for the previous 
satellite and the elapsed time for the distributing node to 
send the node another satellite data set. Because the 
distributing node must send a data set to each of the other 
working nodes prior to sending a subsequent data set to the 
last node, the elapsed time for the distributing node to send 
another data set may be also modeled by t,,(p) in Equation 4.9. 

‘If a node is not ready to receive a message from another 


node, the message is stored in a local buffer. The time to 
read from this buffer is negligible. 
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The total wait time is then the wait time for each subsequent 
satellite data set multiplied by the number of data sets 
received by each working node. Hence, by letting tl represent 
the time to propagate a single satellite, t,,(p) may be modeled 


by the expression: 


0 pig meg ye,) << 157! 


- (4.10) 
t42(P) = (S57) (E05) -e2) eae Geena 


Assuming the time for one node to propagate n 


satellites, t(1), is 


t(1)=n(t1) 
The total computation time for each working node, t,(p), may 
be approximated by the continuous function: 


Gea) 


4.11 
Ee ( ) 


seo) = 


Substituting Equation 4.8 into Equations 3.2 and 
3.3, the speedup and efficiency using a total of p processors 


may be modeled by the following expressions: 


g -£(1) ~ 2 (eae 


eC (p) “Ea py erp ec) 
(4.12) 


ee AGE) 
Fee eee 
Setting tl equal to 11.2 milliseconds and t,(p) equal to .693 


milliseconds, t(p), Sp», and E, were computed using Equations 
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oe 4.12. Figures 4.9 and 4.10 depict the estimates of 
t(p), S,, and 5, for propagating 1728 satellites using 4 to 
1024 processors (a cube dimension of 2 to 10). Using the 
above model, P°T is capable of achieving a maximum speedup 
over 16 with a corresponding efficiency of approximately 90 
percent for a hypercube of dimension 5 (2° nodes). A 
hypercube of dimension 4 achieves a speedup of nearly 14 and 
an efficiency of almost 90 percent. Although these graphs are 
only estimates of the actual values of speedup and efficiency, 
they correspond closely to the actual timed results for four 
and eight node size hypercubes and provide a good indication 
of the parallel computing potential of this algorithm for 
higher dimension hypercubes. 

Another possible improvement to this domain 
decomposition algorithm is to eliminate the need for the 
distributing and collecting nodes. Although the iPSC/2 
located at the Department of Mathematics, Naval Postgraduate 
School is not capable of concurrent input and output, 
concurrent file systems are available. Separate I/O nodes 
allow the computing nodes of a hypercube to concurrently read 
and write to external files. A concurrent file system would 
eliminate the need of the distributing and collecting nodes. 
Additionally, the INTEL Concurrent File System (CFS) allows 
for a common file pointer to be maintained among the computing 
nodes, minimizing overhead in algorithm. Depending on the 


efficiency of the I/O nodes, a further increase in the overall 
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Figure 4.9 Estimated Execution 
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Figure 4.10 Estimated Speedup and Efficiency of P°T for 
Various Hypercube Sizes 
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V. CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH 


The ultimate objective of this thesis is determine the 
parallel computing potential of the NAVSPASUR satellite motion 
model. From the results given in Chapter IV, vectorization 
and control decomposition strategies proved not beneficial in 
Significantly reducing the computation time of the NAVSPASUR 
model. Very few apparent vector operations exist in the 
model, and any attempt to transform the formulas into vector 
operations became algebraically overwhelming. Although the 
analytic formulas of the NAVSPASUR model are quite lengthy and 
complex, they proved to be not computationally intensive 
enough to warrant decomposition of the algorithm by tasks. 

On the other hand, the domain decomposition strategy 
showed promise if the satellites are propagated in a batch 
mode. The P°T algorithm was simple to apply. The algorithm 
provided the flexibility to vary the dimension of the 
hypercube and to easily modify the model itself. Although 
only a maximum efficiency of .67 was achieved, the potential 
efficiency was artificially bounded by the number of nodes 
available with the specific hypercube multi-computer used. 
Having a maximum of only eight nodes available, the efficiency 
of P°T is bounded above by .75. Using the model of P°T’s total 


execution time described in Chapter IV, it was shown that a 
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maximum efficiency of just under 90 percent could be achieved 
with a hypercube consisting of 16 nodes. The corresponding 
speedup factor of nearly 14 would significantly reduce the 
time to predict the state vectors for several thousand 
satellites. 

The success of P°T manifests several possible areas of 
future research. One area would be to apply P°T to higher 
dimension cubes and validate the estimates of speedup and 
efficiency using more than eight nodes. Because the number of 
working nodes is not fixed for this algorithm, P°T could be 
applied easily to any size hypercube with no modifications. 
Once the optimal size is found, one could attach several sub- 
cubes of the optimal dimension and determine the benefit of 
propagating several smaller catalogs of satellite data instead 
of propagating one large catalog. 

Another possible area of research would be to modify the 
current satellite motion model to increase the accuracy of its 
predictions. The results in the previous chapter showed an 
increase in performance of the P°T if the amount of 
computation was increased. Hence, greater accuracy could be 
achieved in far less time using the P°T algorithm than the 
time using the original serial algorithm. Additionally, from 
these results, the parallel computing potential of satellite 
motion models that are more computationally intensive would be 


greater. For example, semi-analytic models which combine the 
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benefits of analytic and numerical models might be good 
candidates. 

Overall, the main lesson learned from this thesis is that 
satellite position prediction can be made more timely through 
parallel computing. Although the best method of 
parallelization might vary depending on the specific model 
used, parallel computing is a viable option to achieve timely 
Satellite position prediction for the growing number of 


objects in orbit around the Earth. 
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APPENDIX A 


INPUT/OUTPUT OF PPT2 


This appendix contains a listing of the primary variables 
used by NAVSPASUR subroutine, PPT2. Tables A.1 - A.4 are 
Seeracted from (Solomon, 1991, pp. 12 - 14). Table A.5 was 
interpreted from the NAVSPASUR source code. 


Table A.1 PPT2 Calling List 


output 
control variable Lo Pisco 
0 Stee 


Oefor inertial coorginates= 
1 for Earth-fixed 
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Table A.2 PPT2 Input Control Variables 





Variable 


on 


OD) (uals), 


a! | 
A 
NF OS 


K 


N 


KF (3) 


ak! 


Option 


Position, no partial derivatives | 
Position, partial derivatives 
Secular recovery 


Epoch update 


Inertial coordinates 


Barth-fixed coordinates 


Secular and drag corrections only 

Secular, drag, and periodic correcti@ 
ae | 
Secular, drag, periodic, and 
sectoral/tesseral corrections 


Not used by PPT2 
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Table A.3 PPT2 Common Blocks 

















—_ = 


output 


at 


—_ a > 


Variable 


Jat (25) 
Gse (0) 
KF (10) 
CF (10) 










Description 





constants set by 
subroutine CONS1 














stored mean elements 






osculating elements 





control variables 





used for appearance 
prediction 






observation stations 
pOSiLeton (kh, 2, 4,1)) 


ft 


BS (3, 4) 












ws) 
V (3) 
W(3) 
R 











VEL (3) r, velocity 









DCSUB PE (6, 8) partial derivatives 






others not used in 
PPT2 







RHOS p° 
HDR 2 0 
HDV ya ie 







RDV 6-x 






DEL AE 


OF @ Orres Ome © 





yteration counter 
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Table A.4 Array F 


Definition [zaput | ontput_| 
pa | ar | mean mean anomaly [| ot | oo 
poz | m |mean motion | ot | 
| 3 | | sire decayterm | ot | 0 | 

SS a a 
| s | e" eccentricity 


mean argument of 
> a 


mean /mean ascending node | node 


aed cos I" | cosine inclination = 


os | ¢ fepocns | 
p10 | # | revotution number | or | 0 
aga | | 
_ iz | avan | ee 
as |» [paar eeotanior | 
er eee. 
a 
i ee ee 


1 not used in PPT2 
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Table A.5 Array OSC 





cosine true anomaly | 


sine true anomaly 
osculating mean anomaly | 


fe __fosculating eccentricity 
1g ___fosculating argument of perigee 


secular and drag correction term for l 


p10 | | not_used by PPT2 
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APPENDIX B 


INTEL iPSC/2 SPECIFICATIONS 


This appendix contains a summary of the iPSC/2 hypercube 
multi-~computer specifications as described in (iPSC/2 User's 
Guide, 1990, pp. 1-1 - 1-11). The exact performance values 


were obtained from (Arshi, 1988, pp. 17 -— 22). 
iPSC/2 


System Resource Manager (Host 


Central Processing Unit INTEL 80386 (4 MIP) 

Numeric Processing Unit INTEL 80387 (250 KFLOP ¢4=ba) 
Memory 8 Mbyte 

External Communication Ethernet TCP/IP local area 


network port 


Operating System AT&T UNIX, Version V, Release 
Shea) 

Nodes 

Node Processor INTEL 80386 (4 MIP) 

Numeric Co-processor INTEL 80387 (250 KFLOP 64-bit) 

Operating System NX/2 
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Internal Communication 


Direct-Connect™ Routing Message Latency -- 350 usec 
Message Bandwidth -- 2.8 
Mbytes/sec 
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APPENDIX C 


P°T—-4 SOURCE CODE LISTING 


This appendix contains the listing of host and node 
programs comprising P°T-4A program set. The host program reads 
the satellite data, loads the node program, and writes the 
results to an external output file. The node program contains 
the instructions for the four nodes to complete their 
respective portions of the P°T-4 parallel algorithm. The node 


program is limited to only satellite state vector prediction. 


Host Program: 


program P3T4h 


This host program reads the satellite data froma 
external file, loads the program P3T-4n on the hypercube 
nodes, and writes results to external output file. 


+ + + + 


implicit real*S ~(a—h,o=-z) 


dimension sat (49,6000) ,end(49,1) 
integer pid,msglen,eof 


data pid/0/,msglen/392/ 
data isat/1/ 


Gall ysetpid (pic) 
m Load node program P3T4n on nodes 


print*,’ host loading nodes’ 
call load (? p3t4n* ,-—1,p1d) 


* Read satellite data and send data to nodes until reach 
* end-of-file 


open (unit=10, file=’ /usr/phipps/in4’ , form=’ unformatted’ ) 
read (unit=10,iostat=eof) (sat(j,1), j=1, 49) 


20 if (eof.ge.0) then 
call csend(5,sat(1,isat),msglen,-1,pid) 
isat=isattl 
read (unit=10,iostat=eof) (sat (j,isat),j=1,49) 


* Receive results from nodes 


90 


* 


call crecv (99, sat (1,isat-1),msglen) 
aor to 20 
endif 


Send message for nodes to halt and clear cube for next process 

call csend(5,end,msglen,-1,pid) 
close (unit=10) 
Gall killeube (-1,-1) 

Write results to external file 
open (unit=11, file=’ /usr/phipps/out4’, form=’ unformatted?’ ) 
do 22 i=1,isat-l 

ZZ write (11,*) (sat (j,i), j3=1,49) 


close (unit=11) 


stop 
end 


o1 


Node Program: 


+ + + + 


* 


program P3T4n 


This node program is a parallel code for satellite 
position prediction using the NAVSPASUR model. This 
program employs four nodes using the control 
decomposition strategy. The specific tasks for each 
node are separated by logical if statements. 
implicit real*8 (a-h,m-z) 
real*8 kz 
Declare cube specific function and variables as integers 
integer mynode,mclock,msgwait,myhost 
integer mynod,pid,msglen, hostid 
common /ppt/£ (25) ,0osc(9) ,u(3),¥v (3) /wi(s) ; vel) Seen 
common /cons/c(25) 
common /n/theta2, eta0,eta20, esq0 
common /9/92,92p,05.04705 
common s/ Chie /EZ 
common /sec/agda,agde,agdi,agdl,agdg, agdh 
data pid/0/,msglen/8/ 
hostid=myhost () 
mynod=mynode () 
Synchronize nodes 
call gsync() 
Nodes set constants and receive data from host 
call consl 
call crecv(5,f£,msglen*49) 
Nodes continue to execute ppt3 until catalog of 
satellite data is exhausted 
1100 1£(tm.eq: 0,000) go tovlloLr 


Node 2 computes new T2 and sends to other nodes 
if (mynod.eq.2)then 
Calieccrutanes 
call csend(mynod,t2,msglen,-1, pid) 
endif 
Nodes execute respective portion of subroutine ppt3 
call ppt3 (mynod) 
Node 0 send results to host 
if (mynod.eq.0)then 


call csend(99,£,msglen*49, hostid, pid) 
endif 


a2 


* Repeat until catalog of satellite data is exhausted 


call crecv(5,f,msglen*49) 


go to 1100 
EO! continue 
ws End main program 
stop 
end 


KK KK KKK HH KKK KKK KKK KKK KKK KKK KKK KKK KEK KKK KK KE KK KE KRKKKKKKKRKHK 


subroutine ppt3 (mynod) 


* This subroutine preschedules the tasks for each node 
numbered 0 through 3. Tasks are partitioned by logical 
sa if statements 


+ 


* Declare variables 


Pmpleczt real*8 (a-h,m-2Z) 
real*8 kz 


* Declare cube specific function and variables as integers 


integer mynode,mclock,msgwait 
integer mynod,pid,msglen 
integer msg4 (10) 


dimension msgl (2) ,msg3 (2) 
* Define common blocks 


common Vppe, £ (25) ,Ose(9) pu (3) pve) ,w(3) ,vel (3) Soe aU Sr 
common /cons/c (25) 

common /n/theta2,eta0,eta20,esq0 

common) /G/G2,02p, 93,94, 05 

commons, Crit/tZ 

common /sec/agda, agde,agdi, agdl, agdg, agdh 


equrvalence (msql(1l),ose(6))7 (msg3 (1) ,osc(1)) 


data pid/0/,msglen/8/ 


if (mynod.ne.2) then 
esq0=f (5) *£(5) 
theta2=f (8) *£(8) 
eta20=1.0d0-esq0 
eta0=dsqrt (eta20) 


*DDDDDDDDDDNNDNDNNONNDNONDNDNDNDDNNNDNNDNNDNNNDNNDNDNNNN0NNNNDND0D00000000 
if (mynod.eq.0) then 


a Post asynchronous receive message commands 


23 


msg4 (1) =irecv(2,t2,msglen) 

msg4 (2) =irecv (1,0sc(5),msglen) 

msg4 (3) =irecv (3,agda,msglen* 6) 
Recover a from mean orbital elements 

call recover 
Send a to all nodes 

call csend(mynod,£(13) ,msglen,-1, pid) 
Compute g2,g2p,g3,94,g95 


call gamma 
hl=tm-f (9) 


Compute osculating 1,¢,a 


osc (9)=((£ (4) *h1+f (3) ) *h1+£(2)) *hl 

osc (3)=pie (osc (9)+£(1)) 

f (14) =-4.0d0*f (13) *£ (3) / (£ (2) *3.0d0) 

f (16) =£ (5) *eta20*£f (14) /£ (13) 

osc (4) =dminl (dmaxl1 (0.0d0, £ (5) +£(16) *hl1), .99999999d0) 
osc (8) =dmaxl1 (1.0d0, £ (13) +£ (14) *h1) 


Send l, e and a to other nodes 


call message (osc (3),08sc(4),osc(8) ,mynod,-1, 0) 


Compute sin I 
£ (15) =dsqrt (1.0d0-thetazZ) 
Receive t2 from node 2 


call msgqwait (msg4 (1) ) 
msg4 (4) =irecv (2,msg3, 2*msglen) 


Make preliminary computations for edll 


tt2=theta2*t2 

pl=(-8.0d0*tt2-3.0d0) *theta2+1.0d0 
pz=(-40.d0*tt 2-11. 0d0)*thetaZz+ i 0av 

vle1=0 .125d0*g2p*p2- (5.0d0*g4*pl1/ (12.0d0*g2p) ) 
vlel=vlel*£f(5) *eta20 

pl=(-24.0d0*tt2-9.0d0) *theta2+1.0d0 
p2=(1.25d0+.9375d0*esq0) *g5 
vle2=(pl1*p2+g3) *eta20*f (15) /(4.0d0*g2p) 
v112=vle2+3 .0d0*eta20*0.15625d0*esq0*f (15) *g5*pl/g2p 
pl=(-16.0d0*ttZ—-5.0d0)-thetaz+120ay 

vle3=pl1*f (15) *eta20*esq0*g5*35.0d0/ (-3.84d02*g2p) 


Receive g from node 1 


call msgwait (msg4 (2) ) 
msg4 (5) =irecv(1,DE,msglen) 


Compute edll 
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cosg=dcos (osc(5) ) 
sing=dsin(osc(5) ) 


p1=4.0d0*cosg**3-3.0d0*cosg 
p2=2.0d0*sing*cosg 
edll=(vlel*p2-vll2*cosg-vle3*pl) *eta0 


Receive cos f, sin f from node 2 


call msgwait (msg4 (4) ) 
msg4 (6) =irecv (2,0S,msglen) 


Compute ed2l 


w22=(1.0d0+0sc (4) *osc(1)) * (2+08sc (4) *osc(1)) /eta20 
pl=(3.0d0—4.0d0*osc (2) **2) *osc (2) * (1.0d0—-2 .0d0*sing**2) 
pl-(4.0d0*ose(1) **2—-3 .0d0) *osc(1) *p2tpl 

mol 0d0—2-0a0*sing**Z) *osc (2) +osc (1) *p2 

p4=3.0d0* (1.0d0-theta2) 

p5=-1.0d0+3 .0d0*theta2 

eta30=eta20*eta0 

ed21l=(p1* (w22+1.0d0/3.0d0) +p3* (1.0d0-w22) ) *p4 
ed21=(ed2l+osc (2) * (w22+1.0d0) *p5*2.0d0) *eta30*g2p/-4.0d0 


Send ed2l1 to node 2 
call csend (mynod,ed21,msglen,2,pid) 
Compute edl anda 


edl=edllt+ed21 

ml (ecosg ~cosg-sing”sing) * (ose(1) *osc(1)—-ese(2Z2) *osc(2))- 
& me72.0a0%ose(1)*osc (2) 

p6=(1.0d0+o0sc (4) *osc(1)) **3 

osc (8) =osc (8) * (1.0d0+g2p/eta20* (p5* (p6-eta30) + 
& p4*p6*p1) ) 


Receive sectoral corrections from node 3 
call msgwait (msg4 (3) ) 
DL=edltagdl 
osc (8)=osc(8)+t+agda 


Receive DE from node 1 


call msgwait (msg4 (5) ) 
msg4 (7) =irecv (1,msgl1,msglen*2) 


Compute final value for e and l 
esq=ede**2+dl1**2 
osc (4) =dsqrt (esq) 
Sinl=dsin(osc(3) ) 
cosl=dcos (osc (3) ) 
osc (3) =artng (DE*sinl+DL*cosl, DE*cosl1-DL*sinl) 


Solve Kepler’s eqn 


call kepler 
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if (kz.ne.0.0d0) cf=c (12) 
eta2=1.0d0-esq 
r=osc (8) *eta2/(1.0d0+osc (4) *osc (1) ) 


s Receive final computations from other processors 


call msgwait (msg4 (7) ) 
call msgwait (msg4 (6) ) 


ws Compute g 
osc (5) =os-osc (3) -osc (6) 
* Compute position and velocity vectors 


cosg=dcos(osc(5) ) 
sing=dsin (osc (5) ) 
cosh=dcos (osc (6) ) 
Sinh=dsin(osc (6) ) 


pl=sing*osc(1)+cosg*osc (2) 
p2=cosg*osc (1) -sing*osc (2) 
Sini=dsqrt (1.0d0-osc(7) **2) 


u(1)=cosh*pZ-sinh*%pl “ose (7) 
u (2) =sinh*p2+cosh*pl*osc (7) 
u (3) =pl*sini 


v (1) =-cosh*pl-sinh*p2*osc (7) 
v (2) =-sinh*pl+cosh*p2*osc (7) 
v (3) =p2*sini 
w(1)=sinh*sini 
w(2)=-cosh*sini 
w (3) =osc (7) 
p3=osc (4) *osc (2) 
p4=osc (4) *osc (1) +1.0d0 
p5=dsqrt (eta2*osc (8) ) 
dow): s1=ih3 

dea vel (i) =(p3*u(i)+p4*v(i)) /p5 
vel (1) =vel (1) +cf£*r*u (2) 
vel (2) =vel (2) -cf*r*u(1) 


* End for node 0O 


endif 


*D0DDD0DD0DDDNNNNNNDNDNNDNNDNDNNDNNNNDNND0000000000000000000000000 
ALTILTILTLIL ITIL LiL ii tii Ti TP ri eee ee ee 


i: Begin node 1 


if (mynod.eq.1) then 


msg4 (1) =irecv (0, £ (13) ,msglen) 
msg4 (2) =irecv (2,t2,msglen) 
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msg4 (3) =irecv (3, adga,msglen*6) 
= Compute sin i and tan 1 


f (15) =dsqrt (1.0d0-theta2) 
tanixf (15) /£ (8) 

hl=tm-f (9) 

pl=1.0d0-5.0d0*theta2 

p2=-35.0d0+24 .0d0*eta0+25 .0d0*eta20 
p3=90.0d0-192.0d0*eta0-126.0d0*eta20 
p4=385 .0d0+360.0d0*eta0+45.0d0*eta20 
p5=-270 .0d0+126.0d0*eta20 
p6=385.0d0-189.0d0*eta20 
theta4=theta2*theta2 


* Receive a from node 0 
call msgwait (msg4 (1) ) 

ss Compute g2,g92p,g3,94,g5 
call gamma 


Compute g 
osc (9)=((£ (4) *h1+f£(3)) *h1+£(2))*h1 
me) —= 1), od0%qZ2p*pi+.09375d0*g2p**2* (p2+p3*thetaZ 
& +p4*theta4) +.3125d0*g4* (21.0d0-9.0d0*eta20+p5*theta2 
& +p6*theta4) 
oper (2) *£ (13) **1.5d0 
£(11) =f (11) /ql 
esac (a) =pie (£f (6) +£ (11) *osc (9) ) 


Send g to all nodes 
call csend (mynod,osc(5),msglen,-1, pid) 
be Complete preliminary computations to solve for dle and dli 


cosg=dcos (osc (5) ) 
Sing=dsin (osc (5) ) 
p3=(3.0d0-4.0d0*sing**2) *sing 
pam. .0d0*cosgq**Z2—-1.0d0 

pom .0d0*sing* cosg 
p6=cosg*cosg-sing* sing 


ok Reeeive t2 from node 2 


call msgwait (msqg4 (2) ) 
msg4 (4) =irecv (2,msg3,msglen*2) 


tt2=theta2*t2 

p= (—o,, 0Od0*tt 2-3 .0d0) *theta2tl1 .0d0 

pew -40.d0*tt2-11.0d0) *thetaZz+1 .0d0 

lel —-Onmtzod0*qZp*p2—-15.0d0*g4*pl/ (12.0d0*gZp) ) 
vlel=vlel*f (5) *eta20 

pl=(-24.0d0*tt2-9.0d0) *theta2+1.0d0 
pe—(le2500+.93750d0*esq0) *go 
vle2=(pl1*p2+g3) *eta20*f (15) / (4.0d0*g2p) 

pla le-cd0*tr2—5.0d0) *theta2+1 .0d0 

vle3=p1*f (15) *eta20*esq0*g5*35.0d0/ (-3 .84d02*g2p) 


a7 


Compute dle and dli 


dle=vlel*p4+vle2*sing+vle3*p3 
dli=-f (5) *dle/ (eta20*tani) 


Receive 1, e and a from node 0 
call message (osc (3),o0sc(4),osc(8) ,mynod,0,1) 
Receive cos f, sin f and r from node 2 
call msgwait (msg4 (4) ) 
Compute d2i 
pizm(4.0d0*osce(1)**2-3.0d0) “ese (1) 74 
Dl=pl—-po~ (3,0d0—4 -0d0 “ose (2) ="2Z) coca) 
p2=p4*osc (1)-p5*osc (2) 
p3= (osc (1) *ose (1) -ose (2) *osc (2) ) *p6=2 -0d0*p5“o2c (1) ~csaren 
d2i=((pl1+3.0d0*pZ) *£(5)+3-0d0"p3) “£0052 eye ae 
Compute d2e 
w20=osc (1) * (3.0d0+o0sc (4) *osc(1)* (3.0d0+osc (4) *osc(1))) 
p4=eta0+1.0d0/ (1.0d0+eta0) 
p5=1.0d0-theta2 


d2e=0 .5d0*g2p* ( (3.0d0*theta2-1.0d0) * (w20+f (5) *p4) 
& +3.0d0*p5* (w20+E£ (5) ) *p3-etaZ0*p5*{ 3. 0dC4p27 be 


Compute di and de 


di=dlitd2i 
de=dle+dZ2e 


Receive sectoral corrections 
call msgwait (msg4 (3) ) 
Compute DE and send to node 0 


DE=detosc (4) tagde 
call csend (mynod,DE,msglen, 0, pid) 


Compute DI 
pl=(£(8) +1..0d0) /220d0 
p2=dsqrt (pl) 
p3=dsqrt (1.0d0-p1) 
Dil—=ps+. 200" p2* (a2 aga) 
Receive osc(6) and DH from node 3 
call message(osc (6) ,DH, dummy, mynod, 3,1) 
Compute final cos i and h and send to node 0 
osc (7)=1.0d0-2.0d0* (DI**2+DH** 2) 
sinh=dsin (osc (6) ) 


cosh=dcos (osc (6) ) 
osc (6) =artnqg (DI*sinh+DH*cosh, DI*cosh-DH*sinh) 
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call csend(mynod,msgl,msglen*2, 0, pid) 
“ Emewot node 1 
endif 


emma L LLL 12112111112711111111111111111111111111111 
fee 3 55 5 333333333333333333333333333333333333333333333 


* Begin node 3 
if (mynod.eq.3) then 


msg4 (1) =irecv (0, £ (13) ,msglen) 
msg4 (2) =irecv (2,t2,msglen) 
msg4 (3) =irecv (1,osc (5),msglen) 


S$ini2=1.0d0-theta2 

£ (15) -dsqrt (sini2) 

hl=tm-f (9) 

osc (9) =h1* (f (2) +hl* (£ (3) +h1l*f (4) )) 
theta4=theta2*theta2 
pl=1.0d0-5.0d0*theta2 

p2=-35 .0d0+24.0d0*eta0+25.0d0*eta20 
p3=90 .0d0-192.0d0*eta0-126.0d0*eta20 
p4=385.0d0+360.0d0*eta0+45.0d0*eta20 
p5=-270.0d04+126.0d0*eta20 
p6=385.0d0-189.0d0*eta20 
p7=-5.0d0+12.0d0*eta0+9.0d0*eta20 
p8=35 .0d0+36.0d0*eta0+5 .0d0*eta20 
p9=3.0d0-7.0d0*theta2 


is Receive a from node 0 - 
call msgwait (msg4 (1) ) 
* Compute g2,92p,93,94,95 
call gamma 
- Compute h 
£ (12)=(—3 .0d0*g2p+.375d0*g2p**2 
& *Ho7—-pe*thetaZ)+1-250d0~qd* (5.0d0-3 .0d0*etaZ0) *p9) *£ (8) 
ql=f (2) *£(13) **1.5d0 
f(z )=f (12) /ql 
if (kz.ne.0.0d0) cf=c (12) 
osc (6) =pie (£ (7) +£ (12) *osc (9) ) -c£* (tm-c (1) ) 
: Send h to node 2 
call csend (mynod,osc(6),msglen, 2, pid) 
z Perform preliminary computations for sectoral corrections 
Poe ae Gd) “2p pi. 0937 5d0*GqZ2p**2* (p2+p3*thetaz 
& +p4*theta4) +.3125d0*g4* (21.0d0-9.0d0*eta20+p5*theta2 
& +p6*theta4) 


Sajmet i) gt 
agda=10.0d0 


Jo 


call sector (hl) 
Receive t2 from node 2 


call msgwait (msg4 (2) ) 
msg4 (4) =irecv (2,msg3,msglen*2) 


Complete preliminary computations to compute sini dilh 


tt2=theta2*t2 

p1=(40.0d0*tt2+16.0d0) *tt2+3.0d0 
p2=(2.0d02*tt2+80 .0d0) *tt2+11.0d0 
vlhli=(5.0d0*g4*pl1/ (12.0d0*g2p) -.125*g2p*p2) *£ (15) *esq0<£(a) 


p2=4.0d0+3 .0d0*esq0 
p3=(-24.0d0*tt2-9.0d0) *theta2+1.0d0 
vilnh2i=((£(5) *£ (8) *.25d0) /g2p) * (g34+(5-.0d0*g5/7 10. 0d0) “52 ee 
&+15.0d0*g5* (£ (15) **2) *p2*p1/8.0d0) 
p2=(=8. 0d0*ttZ2-2, Sau) tnetaz ti. oc 
p3=2.0d0*pl1-1.0d0 
vlh3i=(-35.0d0*g5*esqO*£ (5) *£(3)) “"eztes 5 Go oe 
&7 (576-000 *gZp) 
Receive l, e and a from node Q 
call message (osc(3),o0sc(4),osc(8),mynod, 0,1) 
Receive g from node 1 
call msgwait (msg4 (3) ) 
Compute sectoral corrections and send to all nodes 
agda=0 .0d0 
agde=0.0d0 
agdi=0.0d0 
agd1l=0.0d0 
agdg=0.0d0 
agdh=0 .0d0 
call sector (hl) 
call csend (mynod, agda,msglen*6,-1, pid) 
Compute sini dlh 
cosg=dcos (osc(5) ) 
Sing=dsin (osc (5)) 
pl=2.0d0*sing*cosg 
p2=(4-0d0*cosq**2—3 -0d0) “cosg 
Ssinidlh=vlhli*pl+vlh2i*cosg+vlh3i*p2 
Receive cosf, sinf from node 2 
call msgwait (msgq4 (4) ) 
Compute sini d2h 


wl7=artnq(osc(2),osc(1))+osc(4) *osc (2) ~pie (osc (3) ) 
pZ=Z,0d0*cosgq**2—-1.0d0 
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peta Od0~osgc (2) **2-3.0d0) “ose (1) 

pais. 0d0—4 .0d0*osc (2) **2) *os¢ (2) 

meson oO. Cd0O*ose (1) **2—5 .0d0) +6. 0d0*p2*ose(1) *osc (2) 
be=(pl*osc (1) tp2*osc (2) ) *3.0d0 

w21=p5+ (p6+pl*p3+p2*p4) *osc (4) 

Sinid2h=-0 .5d0*g2p*f (8) *£ (15) * (6.0d0*w17-w21) 


a Compute DH and send osc(6) and DH to node 1 


DH=0.5d0* (sinidlh+sinid2h+agdh) /dsqrt (0.5d0+0.5d0*f (8) ) 
call message (osc(6),DH,0.0d0,mynod,1, 0) 


* End of node 3 


endif 


eee 3 3 5 3933 333333333333333333333333333333333333333333 
else 
Meee O22 2222222222442 422222222222 222222222222222222222 


vs Begin node 2 


msg4 (1) =irecv (0, £ (13) ,msglen) 
msg4 (2) =irecv (1, osc (5) ,msglen) 
msg4 (3) =irecv (3,0s¢c (6) ,msglen) 


x Perform preliminary computations for dlz 


f (15) =dsqrt (1.0d0-thetaz2) 
tt2=theta2*t2 

esqO0=f (5) *f (5) 
etaZ20=1.0d0-esq0 
eta0=dsqrt (eta20) 
eta30=eta20*eta0 


8 Receive a from node 0 
call msgwait (msg4 (1) ) 
x Compute g2,92p,93,94 and g5 
call gamma 
= Compute dlz 


pl=(eta30-1.0d0) *.125d0 

p20 0d0=tEZ—1li Od0) *thetaz+l .0d0) *qZzp 
p3=10 .0d0*g4/ (3.0d0*g2p) 

p4=((-8 .0d0*tt2-3 .0d0) *theta2+1) *p3 
p5=(20.0d0*tt2+8 .0d0) *tt2+1.0d0 
p6=(10.0d0*p5+1.0d0) *g2p 

mom (2a Cd0*poTl.0d0) *p3 

e272 OG0-esq0*tec*ttez*thetaZz* (gZp—.2d0*p3) 
Po (2.0002 *CE2—-33 .0d0) *thetaZz+1.0d0) *g2p 
pl0=((-40.0d0*tt2-9.0d0) *theta2+1.0d0) *p3 
vlsl=pl1* (p2-p4) -.125d0*esq0O*f (8) * (p6-p7) 


OR 


& +p38=—.06z25d0*esq0*(po-p1G) 


pl=eta0+1.0d0/ (1.0d0+eta0) 

p2=f (8) /(1.0d0+E£ (8) ) 
p3=4.0d0+3.0d0*esq0 
p4=(-24.0d0*tt2-9.0d0) *theta2+1.0d0 
p5=(esq0-eta30) *3.0d04+11.0d0 

p6= (20. 0d0*tt2+8.0d0) *tt24+1.0d0 
p7=(pl+p2) /4.0d0 

p8=\(g34+23125d0* ps4qe4p4)7azp 

p9=p5* .15625d0*g5*p4/g2p 
pl0=(1.0d0-£ (8) ) *.46875d0* (p6*2.0d0+1.0d0) *p3*g5*f (8) /g2p 
vls2=(p7*p8+p9+p10) *£ (5) *£(15) 


p7=3.0d0*eta30-3.0d0-(2.0d0+p2) *esq0 
p8= (-16.0d0*tt2-5.0d0) *theta2+1.0d0 


p9=(1.0d0-f (8) ) *£ (8) *.060763884d0*esq0* (1.0d0+4.0d0*p6) 
vls3=(p7* .030381944d0*p8—-p9) *£ (15) *£ (5) *g5/gq2p 


Receive g from node 1 and compute diz 
call msgwait (msg4 (2) ) 
sing=dsin (osc(5) ) 
cosg=dcos(osc(5) ) 
pli=2z.0d0“sing*cosg 
pl2=2.0d0-cosgq"*-~2-1),,,0a0 
pl1l3=(cosgq*cosg-sing* sing) *“eosoq—-Z2.0d0“cosg sina sang 
d1ilz=vls1*pll1l+vls2*cosg+tvls3*p13 


Receive l, e, and a from node 0O 


call message (osc (3),o0sc(4),o0sc(8) ,mynod,0,1) 
msg4 (4) =irecv (0, ed21,msglen) 


Solve Kepler’s equation and send cosf, sinf to all nodes 


call kepler 
call csend (mynod,msg3,msglen*2,-1, pid) 


Complete preliminary computations for d2z 
wl7=artnq(osc(2),o0sc(1))+osc(4) *osc (2) -pie (osc (3) ) 
p3=(4.0d0*ose (1) **2-3.0d0) *osc (1) 
p4=(3.0d0-4.0d0*ose(2) **Z2) %ose(Z) 
p5=pl1l1* (6.0d0*ose (1) **2-3.0d0)4+6.0d0“plZ“cose(t)oseZ) 
p6=(pll*osc (1) +p1l2*osc(2)) *3.0d0 
w21=p5+ (p6+p11*p3+pl12*p4) *osc (4) 
p7=(-5.0d0*thetazt+2-0d0*£(3) 43-000) wz) 
p8=(-5.0d0*theta2+2*f (8) +1.0d0) *wl17 

Receive ed2l from node 0 
call msgwait (msg4 (4) ) 


Compute d2z 


d2z=-f (5) *ed21* (pl1—-1.0d0) 7/eta3s0—(6.0d0~pe—-c7) 425" 342500 


OZ 


* Receive h form node 3 
call msgwait (msg4 (3) ) 
* Receive sectoral corrections 
call crecv (3,agda,msglen*6) 
. Compute OS 


OS=osc (3) tosc (5) tosc (6) +d1z+d2z+agdg 


a Send OS to node 0 
call csend (mynod, 0S,msglen, 0, pid) 


* End of node 2 


Mee 2222202222220422242020420220222222222222222222222222222 
endif 
“3 Return to main program 


return 


end 
bh > a a a ed a a a a a a i a a i a i i a a a a a a a a a, a a a a 


function pie (x) 


+ 


This double precision function computes value of 
“ Semod 2*p1i 


amplicit real*s (a-h,m-z) 
pi2=6.2831853071d0 


pie=dmod (x, pi2) 
if (pie) 90, 91,91 


90 pie=pietpi2 
91 return 
end 


KKK KKK KKK KKK KKK KKK KK KKK KK KKK KKK KKKEKEKKKEKKKKKEKEKKKEKEKEKKKKEKEKHK 


PUnGCt Lon aring(tl,t2Z) 


us This double precision function computes inverse tangent of t1/t2 
~ Hor range 0 to 2*pi 


implicit real*8 (a-h,m-z) 


if (dabs (t1) -dabs (t2))100,104,104 


100 artnq=datan (t1/t2) 
mote) LOT, FOZ) 102 

a0 1 artnq=artnqgt3.14159265358979d0 
Gontou105 

a2 Piet s. 105,105 

103 artnq=artnqt6.28318530717959d0 
Goro, 105 


118) 6! 


104 artnq=atan (-t2/t1)+1.57079632679489d0 


4£(t1) 102-2105 ,405 
105 return 
end 
Se, He, eae, ee, oe, ee ee ee ee ee ee ee, te, oe te te, oe, oe, th, eth, ee eed 


subroutine recover 


* This subroutine recovers the value of a" 
* iteratively fromm 


implicit real*s" (a—-n,m_-2) 
real*8 kz 


common /ppt/f (25) ,0sc(9),u(3),v(3),w(3), vel (3); ©, em, kz 
common /cons/c(25) 
common /n/theta2,eta,eta2,esq 


f (13) =f (2) ** (-2.0d0/3.0d0) 


do 110 i=1,5 
g2p=c (3) / (£ (13) *eta2) **2 
g4=c (5) /(£(13) *eta2) **4 
pi=((35.0d0*theta2) -30.0d0) *thetaZ+370d0 
p2=25 .0d0*eta2+144.0d0*eta+105.0d0 
p3=-90 .0d0*eta2-96.0d0*eta+30.0d0 
p4=25.0d0*etat16.0d0*eta-15.0d0 
p5=3.0d0*theta2-1.0d0 


110 f£(13)=((1.0d0+1.5d0*g2Zp “et a*pot-09375d0-g2paa 


& * (p4t+theta2* (p3+p2*thetaZ) )+.9375d0*g4*eta*esq*pl) 
& f£E(2)) 4* (22.00/73. 0d0) 

return 

end 


KKK KKK KKK KKK KKK KKK KKK KKK IKK KKK KK KKK KIKI KKK KKK KKK 


subroutine gamma 
~ This subroutine computes the dimensionless quantities 


implicit real*8 (a-h,m-z) 
real*8 kz 


common /ppt/£(25),o0sc (9) ,u(3),v (3) ,w(3), vel(3) 92, em, <2 
common /cons/c (25) 

common /n/theta2,eta0,eta20,esq0 

common /g/g2,g2p,g93,94,95 


gz=c'(3)/7£ (13) **2Z 
g2p=g2/eta20**2 

g3=c (4) / (£ (13) *eta20) **3 
g4=c (5) / (£ (13) *eta20) **4 
g5=c (6) / (£ (13) *eta20) **5 


return 


end 
> > ab dt dd ae ae, a ae a a aba. de a, a de a de, ae ae a, ae a, ae a a a a a, ae ae, a de ae, a, a, a, a, ae, de, ae, He, He, oe, oh, ae, oe, a, ae, ah, ae, ah, oe, He, He ae 
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Subroutine critinel 
* This subroutine computes the value of T2 


implicit real*8 (a-h,m-z) 
real*8 kz 


common fppeyt (25) Pose (9) 743) ,y (3) ,w(3),vel (2), tm, kz 
common /n/theta2,eta0, eta20,esq0 
common /crit/t2 


theta2=f (8) *£ (8) 
tl=1.0d0-5.0d0*theta2 
beta=1.0d02/ (2.0d0**11) 


p3=beta*tl1*tl 
pl=dexp (-p3) 
p2=1.0d0+pl 
plex=pl*pl 
p4=1.0d0 
p3ex=-0.5d0*p3 


dorzl10 1=2,13 
if (i.le.11)p2=p2* (1.0d0+plex) 
plex=plex*plex 
p4=p4+p3ex 
210 p3ex=-p3*p3ex/dble (i+1) 


t2=p2*p4*beta*tl 


return 
end 
KH KKH KKK KK KKK HHH KKK KKK KKK KEKKAKKRKRKRKeEE HE 


subroutine kepler 


“ This subroutine solves Kepler’s Equation using Steffenson’s 
= Method and then computes cos f and sin f 


implicit real*8 (a-h,m-z) 
real*8 kz 


eermmone, poe/£(25);ose (9),u(3),v(3),w(s),vel(3),r,tm,kz 


e3=osc (3) 
do 410 i=1,20 
el=e3 
e3=osc (3) +osc (4) *dsin(e3) 
if (dabs (e3-el) .1lt.1.0d-08) go to 420 
e2=e3 
e3=osc (3) +osc (4) *dsin(e3) 
if (dabs (e3-e2) .1t.1.0d-08) go to 420 
410 e3=e3+ (e3-e2) **2/ (2.0d0*e2-el-e3) 


420 cosf=dcos (e3) 
el=1.0d0-osc (4) *cosf 
osc (1) =(cosf-osc(4))/el 
eta=dsqrt (1.0d0-osc (4) **2) 
osc (2)=(eta*dsin(e3))/el 


return 
end 
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subroutine sector (hl) 


* This subroutine computes sectoral correction terms to be added 
x to the variation of each of the orbital elements 


implicit real*8 (a-h,m-z) 
real*8 kz 
dimension clm(8),slm(8),tf(8),tfp(8),te(8, 8) 


common /ppt/f (25) ,0sc(9),u(3),v (3) ,wi(3), vel (3) etme ke 
common /cons/c(25) 

common /n/theta2,eta,eta2,esq 

common /sec/agda,agde, agdi,agdl,agdg, agdh 


if (agda.ne.1.0d01)go to 330 
clm(1) =0.35961113d-07 
clm(2)=0.18246786d-05 

clm (3) =0.22207398d-05 

clm (4) =clm (3) 
clm(5)=0.37098633d-06 
c1lm(6) =clm (5) 

clm(7) =0.22580118d-06 
clm(8)=clm(7) 


3lm(1)=0.24286442d01 

slm(2)=0.57568525d01 

silm(3)=0.12107857d00 

slm(4)=slm (3) 

slm(5) =0.96916794d0 

slm(6) =slm(5) 

slm(7)=0.11169656d01 

slm(8)=slm(7) ms 


te (7,1)=0.0d0 
te (7,2)=0.0d0 
te (7,3) =1.0d0 
te(7,4)=-1.0d0 
te (7,5)=1.0d0 
te (7,6)=-1.0d0 
te (7,7)=1.0d0 
te (7,8) =-1.0d0 
te (8,1)=1.0d0 
te (8,2) =2.0d0 
te (8,3) =1.0d0 
te (8,4) =1.0d0 
te (8,5) =2.0d0 
te (8, 6) =2.0d0 
te (8,7)=3.0d0 
te (8,8) =3.0d0 


feta=1.0d0+eta 
fl=1.0d0+€f (8) 

£2=1 .Od0-€£ (8) 
£3=1.0d0+3 .0d0*f (8) 
£4=1 .0d0-3 .0d0*£ (8) 
£lLSZ=£ (15) 42 (15) 
rl=f (2) *£ (11) 

r2=f (2) *£(12) —e(iZ) 


ti (1) =51.5d02£(6) 42s) 
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een eed SAO E152 

Meo O. 937 5e0"£152*£3-0.75d0*£1 
mica Or, 957 5G0~f£152*£4—-0.75d0*f2 
eo =lnovoad0*i(15)*f£1*£4 

eo =— i, 6 7 SOO ft (15) *£2* £3 

fia) ) =>. 62900*f152* FT 

meee ) =o. 6025G00*F152* £2 


me ol y= — 1). Sa0* (£ (8) **2-f£152) 

tp 2) =3.000*E (8) *£ (15) 

mapa b—t (>) ~ (lo 750d0*t (8) *£3—-2.8125d0*£152+0.75d0) 
meee = lo) = (87 5d0~£ (6) ~E4+2.8125d0*£152-0.75d0) 
Depo =ti~ (1. baisdo~t (8) *£4+3.75dad0*£152) 

mop 6) 4f2* (1. 875*2(8) *E3—3.75d0* £152) 

ioe et (15) * (11. 25d0*£ (8) *£1—5.625d0* £152) 

fea) =f (15) *(11.25d0*f (8) *£24+5.625d0* £152) 


ta=1.0d0/£(2) /£(13) **5 
flp1l=6.0d0 
tg=1.0d0/eta2/eta 
tgp=3.0d0*£ (5) *tg/eta2 


do 300 i=1,8 
mets—3) 301,302, 301 
302 f1p1=8 .0d0 
tg=tgp/3.0d0 
tgp=(1.0d0+4.0d0*esq) / (eta2**2*eta2*eta) 
ta=ta/f (13) 
S01 tai=ta*clm(i)/(r1l*te(7,1)+r2*te (8,2) ) 
te (2,i)=-tai*t£ (i) *tg*eta*te(7,1i)/£ (5) 
te (3,i)=tai*tf (i) * (te(7,1i) *£ (8) -te (8,i)) /£(15) *tg/eta 
te(4,i)=tai* (flp1*f (5) *tg-eta2*tgp) *t Ff (i) 
te(5,i)=tai* (tf (i) *£ (5) *eta/feta*tgptflpl*t£ (i) *tg 


& +£(15) /£1*t£p (i) *tg/eta) 
300 te (6,i)=tai*tfp(i) *tg/eta 
go to 340 
330 the=osc (6) 


meqez.eq.0.0cd0) the=the—c (12) * (£(9) thi-c(1)) 


do 341 i=1,8 

Sint=osc(5) *te(7,i)+the*te (8,1) -slm(i) 
cost=dcos (sint) 

Sint=dsin(sint) 

agde=agdette (2,1) *cost 
agdi=agditte (3,i) *cost 
agdl=agdl+te (4,1) *sint 
agdg=agdgtte (5,1) *sint 


4 1 agdh=agdh+te (6,1) *sint 
540 return 
end 


KRREKKKKKKKEKKKKKEKKKKKKKKKKKKKKEKKKEKKKRKKEKKKEKEKKEKKEKEKKEKKEKKKKKKKK 


subroutine message (d1,d2,da3,mynod, dest, itype) 


‘e This subroutine is used to join to disjoint variables 
ws into a single array to be sent or received in one message 


Oy! 


amplicit treal*~6(a-n7m—z) 


dimension msg2 (3) 
integer mynod,dest,pid,itype,msglen 


data pid/0/,msglen/8/ 


if (itype.eq.0) then 
msg2 (1) =dl 
msg2 (2) =d2 
msg2 (3) =d3 
call csend (mynod,msg2,msglen*3,dest, pid) 
else 
call crecv (dest,msg2,msglen*3) 
di=msg2 (1) 
d2=msg2 (2) 
d3=msg2 (3) 
endif 


return 
end 
ee ee ee ee ee ee ee ee ee 


subroutine consl 


This subroutine is used by NAVSPASUR to set the 
* constants used by the satellite motion model 


implicit real*8 (a-h,m-z) 
common /cons/c(25) 


BETA=3 98597 .62579588d0 
ERKM=6378.135d0 
FLAT=298 .26d0 


K-TERMS 


C20=-0 .4841605d-03 
C30=0.95958d-06 
C40=0.55199d-06 
C50=065875d-07 
6(3)=-0. 5007 C20-dsqre(5-0a0) 
c (4) =C30*dsqrt (7.0d0) 

c (5) =0.375d0*C40*dsqrt (9.0d0) 
c (6) =C50*dsqrt (11.0d0) 


is TWO. S 


c (7) =6.283185307179586d0 
c (16) =1.0d0/c(7) 


* HERGS/DAY , SECS/HERG, MINS/HERG 
c (9) =ERKM*dsqrt (ERKM/BETA) 


c (8) =86400.0d0/c(9) 
e(17)—144070007 cs) 
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* WE (RAD/DAY), WE- 2 PI, WE(RAD/HERG) 
c (11) =6.3003880987d0 
e(10) =c (11) —c(7) 
e12) =e (11) /e(s) 
x EARTH FLATTENING 
c(13)=(2.0d0-1.0d0/FLAT) /FLAT 
* SM/ER, KM/ER, NM/ER 
c (20) =ERKM/1.609344d0 
c (21) =ERKM 
c (22) =ERKM/1.852d0 


* —_ DEG/ RAD 
e23)=360-00d0/c(7) 


* RANGE RATE/ER/HERG TO CYCLES/SECOND - CONVERSION 
c(24)=c (21) *216.980d+06/ (c (9) *2.997925d+05) 


x FENCE PLANE DISPLAC FROM EARTH CENTER 
G25) —-0-31000d-02 


return 


end 
bd, db, a, i, a>, i a i, a, a a, a, a, a, a, i, a a Oe, a ae a, ae a a a a, ae a, a, ae ee a, a ee a ee, a, ae, a, ae a, a de ae, oe, ae ae, Oe, oe, ae, ee ae oe 
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APPENDIX D 


P°T SOURCE CODE LISTING 


This appendix contains the listing of host and node 
programs comprising P°T program set. The host program loads 
the node program and clears the nodes once the process is 
complete. The node program contains the instructions for the 
nodes to complete their respective portions of the P°T 
parallel algorithm. The node program assigns Node 0 to be the 
distributing node, the highest numbered node to be the 
collecting node, and the remaining nodes to be the working 
nodes. The working nodes execute the original NAVSPASUR 


subroutine PPT2 with only minor modifications. 


Host Program: 


Program E3tGh 


This host program loads the node program P3tn on the 
*% nodes of the attached hypercube. Upon completion of 
- the catalog of satellite data, program clears nodes 
me for another process. 
x Set host specific parameters 

data pid/0/ 
td Set process id 

call setpid (pid) 
a) Load program P3tn on the nodes 

print*,’ loading nodes’ 

Call load{’ pstn’ ,-1;,p2d) 
* Receive message that nodes are complete 

call crecv (99, istop, 4) 

print*,’nodes complete’ 
* Kill process on nodes 


dre@ 





ean. kil leube (-1,—-1) 


stop 
end 


ae 


data 


Node Program: 


program P3Tn 
This program propagates n-2 satellites concurrently, 
where n is the number of nodes belonging to the 
attached cube. Node 0 is the distributing node and 
Node n-1 is the collecting node. The remaining nodes 
are the working nodes that propagate the satellites 
using NAVSPASUR subroutine PPT2. For simplicity, 
the tasks for all nodes are combined on this one node 
program. Tasks are partitioned by logical if statements. 


+ + + + F&F F HF 


implicit real*8 (a-h,o-z) 
real*8 k£f(10) 


integer mynod, numnode, hostid, pid 
integer mynode, numnodes,myhost,mclock 
integer eof 

dimension ar(6,8) , br (3; 6) ,cr (3, 6) ,dav3) 


common/cons/a (64) 


common/ppt/£ (25) ,osc (10) ,k£ (10) , c£(10) ,bs (3, 4) ,u(3),v (3) (4G eee 


& vel (3) ,dind,; tmpdkz-dident 
common/dcsub/pe (6, 8) ,e(8, 8) ,ep (8, 8) ,g(8) ,gp(8) ,1£ti (8) /1ft]evee 
& iteri, itero, jJo£, jol; stat (20) ,tol(6)7 iw,of(l)) -owlore, 


common/foreo/rho(3),ros,hdr,hdv, rdv,del,iter 
common/bloc/sat (84, 4800) 


data istop/1/,pid/0/,msglen/672/ 
data isat/1/,n/1/ 


mynod=mynode () 
numnode=numnodes () 
hostid=myhost () 


*DDDDDDDNDDDDNDNDNDNDNDONDNDDNDDNDNDDNDNDDDNNDDNDNNDNNN00000000 
* Node 0 reads and distributes data among the working 
Hd nodes 


if (mynod.eq.0) then 
* Read complete catalog of satellite data 


open (unit=10, file=’ /usr/phipps/in’ , form=’ unformatted’ ) 
read (unit=10,1o0stat=eof) (sat (j,1), j=1, 84) 


1200 if(eof.ge.0)then 
isat=isattl 
read (unit=10,i0stat=eof) (sat (j,isat), jJ=1, 84) 
go to 2200 
endif 
close (unit=10) 


* Send number of data sets to all nodes 
t O=mclock () 


isat=isat-l 
call csend (mynod,isat,4,-—-1, pid) 


Laz 


x Distribute satellite data sets to working nodes 


iter=isat/ (numnode-2) 
do 1201 j=1,iter 
do 1201 i=1,numnode-2 
call csend(mynod, sat (1,n) ,msglen,i, pid) 


iL One n=n+1 

iter=mod (isat, numnode-2) 

n=n-1 

do 1202 j=1,iter 
uZ02 call csend (mynod, sat (1,n+j),msglen, j, pid) 
* End Node 0 


*00000000000000000000000000000000000000000000000000000 
* 


xs Begin working nodes 


else 
if (mynod.1t .numnode-1)then 


* Use subroutine consl to set constants 
call consl 
= Receive number of data sets from Node 0 and compute total number 


of satellites to propagate 


call crecv(0,isat, 4) 

tO0=mclock () a 

if (mynod.le.mod(isat,numnode-2) ) then 
iter=isat/ (numnode-2) +1 


else 
iter=isat/ (numnode-2) 
endif 
* Receive satellite data, execute PPT2, and send results to 
x collecting node 


do 1220 i=1, iter 
call crecv(0,f£,msglen) 


= Set parameters for subroutine ppt2 


ind=1 
kz=idint (dkz) 


* Compute secular recovery 
Gall ppt 2(ind,kz) 
x Compute subsequent task, ie. predict position, update elements 


ind=idint (dind) 
Galt ppez (ind, kz) 


220 call csend (mynod, f,msglen, numnode-1, 0) 


LS 


* End Working Nodes 


* 


* CCCCCCCCCCCCCCCCC CCCCCECECCCCCC CCC EC CEC CEC eee eee ee er ce 
x Begin Collecting Nodes 


else 
* Receive total number of satellite sets 
call crecv(0,isat, 4) 
ws Collect results from Working Nodes 
do 1230 i=1,isat 
uZ50 call crecv(-1,sat(1,i),msglen) 
* Write results to external file 
open (unit=11, file=’ /usr/phipps/out’ , form=’ unformatted’ ) 
do 1231 :i=1,1isat 
250 write (unit=11,*) (sat(j,1i),j=1, 84) 
close (unit=11) 
* Send message to Host that process is complete 
call csend(99,istop, 4, hostid, pid) 
* End Collecting Node 


* CCCCCECECCCC GEE CECE CEC CCC EC CC CECE CCC CCC CC CC CCC Cee ee sere 


endif 
endif 


stop 
end 
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